some-car-9623
04/05/2023, 1:47 PMgreat-optician-81135
04/05/2023, 2:31 PMadd_database_name_to_urn
flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same. Can you provide the release we can expect this change. Thank you!gray-airplane-39227
04/05/2023, 3:51 PMremove_stale_metadata
, and I ingest a mysql database, and then drop a table from the schema and ingest again, but the table metadata still there, I don’t see any entity being deleted from the details log. I’ve tested this functionality multiple times from mysql and postgres, snowflake. Am I missing something?quick-pizza-8906
04/05/2023, 4:12 PMownerA, ownerB
and then receive MCP with ownership aspect for that entity containing owners list equal to ownerC
the whole aspect will be overwritten to contain only ownerC
. Is there any way to make this behavior a bit more complex, for example by merging existing list with the one coming from MCP so the result would be ownerA, ownerB, ownerC
?polite-afternoon-10256
04/06/2023, 7:44 AMagreeable-cricket-61480
04/06/2023, 7:56 AMchilly-waitress-13685
04/06/2023, 8:34 AMmicroscopic-room-90690
04/06/2023, 9:01 AM_schema_pattern._allow
, the log shows schemas not in allow list still be ingested into datahub. Anyone can help?acoustic-airplane-18718
04/06/2023, 9:17 AMbland-orange-13353
04/06/2023, 10:12 AMsalmon-angle-92685
04/06/2023, 12:22 PM'failures': {'permission-error': ['No tables/views found. Please check permissions.']},
Then, to test if it was indeed a permission problem, I recreated the pipeline ingestion via the UI, using the same user and role of the yaml config. Doing so, it worked just fine.
Could anyone help me solve this?
Thanks !wide-florist-83539
04/06/2023, 7:06 PMacryl-datahub[airflow]==0.10.0
And setting
lazy_load_plugins = False
I still dont see Datahub listed as a plugin for airflow and My DAG does not show any related log messages that display Emitting Datahub ...
Currently following this tutorial - https://datahubproject.io/docs/lineage/airflow/microscopic-room-90690
04/07/2023, 5:28 AMfew-sunset-43876
04/07/2023, 8:32 AM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': 'b73dd407-7091-4f48-8913-5474e4c7f447',
'infos': ['2023-04-07 08:28:33.329164 INFO: Starting execution for task with name=RUN_INGEST',
"2023-04-07 08:28:39.452058 INFO: Failed to execute 'datahub ingest'",
'2023-04-07 08:28:39.452302 INFO: Caught exception EXECUTING task_id=b73dd407-7091-4f48-8913-5474e4c7f447, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/b73dd407-7091-4f48-8913-5474e4c7f447/recipe.yml --report-to /tmp/datahub/ingest/b73dd407-7091-4f48-8913-5474e4c7f447/ingestion_report.json
[2023-04-07 08:28:35,631] INFO {datahub.cli.ingest_cli:167} - DataHub CLI version: 0.9.2
[2023-04-07 08:28:35,664] INFO {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-04-07 08:28:37,610] ERROR {datahub.entrypoints:206} - Command failed: Failed to create source: bigquery is disabled; try running: pip install 'acryl-datahub[bigquery]'
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 97, in _ensure_not_lazy
plugin_class = import_path(path)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path
item = importlib.import_module(module_name)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 59, in <module>
from datahub.ingestion.source.bigquery_v2.profiler import BigqueryProfiler
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/profiler.py", line 20, in <module>
from datahub.ingestion.source.ge_data_profiler import (
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 24, in <module>
from great_expectations.datasource.sqlalchemy_datasource import SqlAlchemyDatasource
ModuleNotFoundError: No module named 'great_expectations.datasource.sqlalchemy_datasource'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 188, in __init__
source_class = source_registry.get(source_type)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 144, in get
raise ConfigurationError(
datahub.configuration.common.ConfigurationError: bigquery is disabled; try running: pip install 'acryl-datahub[bigquery]'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 164, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
raise e
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper
res = func(*args, **kwargs)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 181, in run
pipeline = Pipeline.create(
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 313, in create
return cls(
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 190, in __init__
self._raise_initialization_error(e, "Failed to create source")
File "/tmp/datahub/ingest/venv-bigquery-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _raise_initialization_error
raise PipelineInitError(f"{msg}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to create source: bigquery is disabled; try running: pip install 'acryl-datahub[bigquery]'
My datahub version is 0.9.2. I have run
pip install 'acryl-datahub[bigquery]'
and
pip install 'acryl-datahub[bigquery]' great-expectations
but the error still exists.
It doesn't happen on the newest data version e.g 0.10.1
Can anybody help? Thanks in advance!curved-judge-66735
04/07/2023, 11:07 AMRegarding BigQuery Usage Ingestion
Hi team, I’m wondering what’s the current best practice to ingest bigquery usage(V2) for centralized exported Bigquery audit metadata.
Our company has an organization level aggregated sink for audit metadata which means all projects are sinking audit logs to same Bigquery table.
However, in current Bigquery ingestion design, usage is coupled with metadata ingestion per project. So we are executing same query and filter out most of the records for every project which looks like a huge waste.
What would be the recommended way for this situation?
One workaround I came up with is creating view in different dataset filtering by project.
And ingest usage from exported bq audit metadata actually requires bigquery.jobs.create
permission for all projects, not just the extractor projects. Linkproud-dusk-671
04/07/2023, 12:06 PMenv must be one of {'DEV', 'NON_PROD', 'QA', 'TEST', 'PRE', 'STG', 'UAT', 'EI', 'PROD', 'CORP'}, found integration
microscopic-machine-90437
04/07/2023, 2:34 PMprehistoric-furniture-42991
04/07/2023, 5:40 PMlively-dusk-19162
04/07/2023, 7:10 PMlively-dusk-19162
04/07/2023, 7:10 PMfaint-australia-24591
04/09/2023, 7:29 PMdry-guitar-29671
04/10/2023, 10:55 AMminiature-plastic-43224
04/10/2023, 1:29 PMgreen-lion-58215
04/10/2023, 7:17 PMbillions-journalist-13819
04/11/2023, 4:53 AM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '2a118433-e41e-4ed3-b0c9-4bd2781e8748',
'infos': ['2023-04-11 04:49:09.994683 INFO: Starting execution for task with name=RUN_INGEST',
"2023-04-11 04:49:54.496733 INFO: Failed to execute 'datahub ingest'",
'2023-04-11 04:49:54.497142 INFO: Caught exception EXECUTING task_id=2a118433-e41e-4ed3-b0c9-4bd2781e8748, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Report ~~~~
{
"cli": {
"cli_version": "0.10.0.7",
"cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
"py_version": "3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]",
"py_exec_path": "/usr/local/bin/python",
"os_details": "Linux-5.4.0-113-generic-x86_64-with-glibc2.31",
"peak_memory_usage": "83.39 MB",
"mem_info": "83.39 MB"
},
"source": {
"type": "hive",
"report": {
"events_produced": 0,
"events_produced_per_sec": 0,
"entities": {},
"aspects": {},
"warnings": {},
"failures": {},
"soft_deleted_stale_entities": [],
"tables_scanned": 0,
"views_scanned": 0,
"entities_profiled": 0,
"filtered": [],
"start_time": "2023-04-11 04:49:32.793563 (10.39 seconds ago)",
"running_time": "10.39 seconds"
}
},
"sink": {
"type": "datahub-rest",
"report": {
"total_records_written": 0,
"records_written_per_second": 0,
"warnings": [],
"failures": [],
"start_time": "2023-04-11 04:49:32.227873 (10.95 seconds ago)",
"current_time": "2023-04-11 04:49:43.181993 (now)",
"total_duration_in_seconds": 10.95,
"gms_version": "v0.10.1",
"pending_requests": 0
}
}
}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/2a118433-e41e-4ed3-b0c9-4bd2781e8748/recipe.yml --report-to /tmp/datahub/ingest/2a118433-e41e-4ed3-b0c9-4bd2781e8748/ingestion_report.json
[2023-04-11 04:49:32,148] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
[2023-04-11 04:49:32,231] INFO {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-04-11 04:49:42,973] INFO {datahub.ingestion.run.pipeline:201} - Source configured successfully.
[2023-04-11 04:49:42,976] INFO {datahub.cli.ingest_cli:129} - Starting metadata ingestion
[2023-04-11 04:49:43,182] INFO {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/2a118433-e41e-4ed3-b0c9-4bd2781e8748/ingestion_report.json' mode='w' encoding='UTF-8'>
[2023-04-11 04:49:43,184] INFO {datahub.cli.ingest_cli:134} - Source (hive) report:
{'events_produced': 0,
'events_produced_per_sec': 0,
'entities': {},
'aspects': {},
'warnings': {},
'failures': {},
'soft_deleted_stale_entities': [],
'tables_scanned': 0,
'views_scanned': 0,
'entities_profiled': 0,
'filtered': [],
'start_time': '2023-04-11 04:49:32.793563 (10.39 seconds ago)',
'running_time': '10.39 seconds'}
[2023-04-11 04:49:43,185] INFO {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
{'total_records_written': 0,
'records_written_per_second': 0,
'warnings': [],
'failures': [],
'start_time': '2023-04-11 04:49:32.227873 (10.96 seconds ago)',
'current_time': '2023-04-11 04:49:43.184241 (now)',
'total_duration_in_seconds': 10.96,
'gms_version': 'v0.10.1',
'pending_requests': 0}
famous-florist-7218
04/11/2023, 10:21 AMstateful_ingestion
config.
ModuleNotFoundError: No module named 'datahub.ingestion.source.state.tableau_state'
steep-needle-64409
04/11/2023, 12:29 PMpurple-terabyte-64712
04/11/2023, 2:31 PMacceptable-nest-20465
04/11/2023, 2:39 PMacceptable-nest-20465
04/11/2023, 2:39 PM