purple-terabyte-64712
05/10/2023, 7:31 AMcolossal-tent-57599
05/10/2023, 8:07 AMbland-orange-13353
05/10/2023, 12:56 PMloud-librarian-93625
05/10/2023, 1:29 PMdatahub ingest -c 'C:\Users\matt.evans\.datahub\tableau\tableau.dhub.yaml' --dry-run
but am getting the following error
File "C:\Users\matt.evans\AppData\Local\Programs\Python\Python310\lib\site-packages\datahub\configuration\config_loader.py", line 101, in load_config_file
raise ConfigurationError(
datahub.configuration.common.ConfigurationError: Cannot read remote file C:\Users\matt.evans\.datahub\tableau\tableau.dhub.yaml, error:No connection adapters were found for 'C:\\Users\\matt.evans\\.datahub\\tableau\\tableau.dhub.yaml'
Any idea what I'm doing wrong? Seems to be something in the yaml file it doesn't like.bland-orange-13353
05/10/2023, 3:31 PMbland-orange-13353
05/10/2023, 3:39 PMrapid-spoon-75609
05/10/2023, 9:25 PMteam
tag in metadata as a tag on the Kafka topicpowerful-answer-39247
05/11/2023, 2:33 AMFile "/tmp/datahub/ingest/venv-postgres-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/state_provider/datahub_ingestion_checkpointing_provider.py", line 76, in get_latest_checkpoint
] = self.graph.get_latest_timeseries_value(
File "/tmp/datahub/ingest/venv-postgres-0.10.2/lib/python3.10/site-packages/datahub/ingestion/graph/client.py", line 299, in get_latest_timeseries_value
assert len(values) == 1
AssertionError
important-area-90857
05/11/2023, 5:30 AMloud-hospital-37195
05/11/2023, 7:15 AMnumerous-refrigerator-15664
05/11/2023, 7:19 AMmysterious-table-75773
05/11/2023, 9:02 AMdelightful-painter-8227
05/11/2023, 10:04 AMloud-hospital-37195
05/11/2023, 11:37 AMlemon-scooter-69730
05/11/2023, 12:48 PMpipeline = Pipeline.create(recipe)
pipeline.run()
pipeline.pretty_print_summary()
For example it throws this exception
if regex("LATERAL VIEW EXPLODE(col)"):
TypeError: 'str' object is not callable
This error comes from sqllineage
because it uses the latest version of sqlparse==0.4.4
pinning my version to 0.4.3
fixed the problem.
I also noticed that the version of sqllineage==1.3.6
uses the as present here I resolved it by moving my version of sqllineage to 1.4.2
. I am just putting this here in case anyone runs into this issue... I spent the better part of an hour or two getting to the bottom of this.lemon-scooter-69730
05/11/2023, 1:30 PMdamp-orange-46267
05/11/2023, 3:01 PM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '7985f351-d346-4713-b683-f256a1b24b0d',
'infos': ['2023-05-11 14:55:13.610978 INFO: Starting execution for task with name=RUN_INGEST',
"2023-05-11 14:55:17.687276 INFO: Failed to execute 'datahub ingest'",
'2023-05-11 14:55:17.687583 INFO: Caught exception EXECUTING task_id=7985f351-d346-4713-b683-f256a1b24b0d, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/7985f351-d346-4713-b683-f256a1b24b0d/recipe.yml --report-to /tmp/datahub/ingest/7985f351-d346-4713-b683-f256a1b24b0d/ingestion_report.json
[2023-05-11 14:55:16,653] INFO {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.10.0
1 validation error for PipelineConfig
source -> sink
extra fields not permitted (type=value_error.extra)
limited-forest-73733
05/11/2023, 4:13 PMlittle-refrigerator-78584
05/12/2023, 1:21 PMsource:
type: glue
config:
aws_region: eu-central-1
platform: glue
extract_transforms: True
database_pattern: {'deny': ['.*']}
table_pattern: {'deny': ['.*']}
It successfully pulled 2 jobs from aws and is showing the home page under Platform section.
But when i click on it shows No results found for ""
If i want to just pull my jobs from glue and not the tables and databases then it wont show it ?bland-orange-13353
05/12/2023, 1:29 PMpurple-terabyte-64712
05/13/2023, 3:58 AMminiature-ghost-14229
05/13/2023, 1:20 PMDataset query failed with error: 400 INFORMATION_SCHEMA.PARTITIONS query attempted to read too many tables. Please add more restrictive filters. Location: EU Job ID: fb2b9691-cb6bb7
I tried to filter this query and reduce the amount of data to fetch but it looks like didn't work.
Does datahub parse the entire project? I need to ingest only a specific dataset, so I add a filter and included my dataset name in allow patterns but it looks that is not working or taking it in consideration.
Thank youbrave-room-48783
05/14/2023, 9:00 AM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '0de3d15c-4e8d-45bf-8877-46e9c8c66de8',
'infos': ['2023-05-14 08:45:07.246534 INFO: Starting execution for task with name=RUN_INGEST',
"2023-05-14 08:45:11.476157 INFO: Failed to execute 'datahub ingest'",
'2023-05-14 08:45:11.486188 INFO: Caught exception EXECUTING task_id=0de3d15c-4e8d-45bf-8877-46e9c8c66de8, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub --debug ingest run -c /tmp/datahub/ingest/0de3d15c-4e8d-45bf-8877-46e9c8c66de8/recipe.yml --report-to /tmp/datahub/ingest/0de3d15c-4e8d-45bf-8877-46e9c8c66de8/ingestion_report.json
[2023-05-14 08:45:08,814] DEBUG {datahub.telemetry.telemetry:219} - Sending init Telemetry
[2023-05-14 08:45:10,004] DEBUG {datahub.telemetry.telemetry:248} - Sending telemetry for function-call
[2023-05-14 08:45:10,417] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.2
[2023-05-14 08:45:10,582] DEBUG {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config
[2023-05-14 08:45:10,582] DEBUG {datahub.ingestion.sink.datahub_rest:118} - Setting gms config
[2023-05-14 08:45:10,583] DEBUG {datahub.ingestion.run.pipeline:203} - Sink type datahub-rest (<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'>) configured
[2023-05-14 08:45:10,583] INFO {datahub.ingestion.run.pipeline:204} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-05-14 08:45:10,595] DEBUG {datahub.ingestion.run.pipeline:278} - Reporter type:file,<class 'datahub.ingestion.reporting.file_reporter.FileReporter'> configured.
[2023-05-14 08:45:10,630] DEBUG {datahub.telemetry.telemetry:248} - Sending telemetry for function-call
[2023-05-14 08:45:11,034] ERROR {datahub.entrypoints:195} - Command failed: Failed to find a registered source for type metabase: 'str' object is not callable
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 119, in _add_init_error_context
yield
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 214, in __init__
source_class = source_registry.get(source_type)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 173, in get
tp = self._ensure_not_lazy(key)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 117, in _ensure_not_lazy
plugin_class = import_path(path)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 48, in import_path
item = importlib.import_module(module_name)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/source/metabase.py", line 10, in <module>
from sqllineage.runner import LineageRunner
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 41, in <module>
_monkey_patch()
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 35, in _monkey_patch
_patch_updating_lateral_view_lexeme()
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 24, in _patch_updating_lateral_view_lexeme
if regex("LATERAL VIEW EXPLODE(col)"):
TypeError: 'str' object is not callable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 182, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
raise e
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
res = func(*args, **kwargs)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 187, in run
pipeline = Pipeline.create(
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 328, in create
return cls(
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 211, in __init__
with _add_init_error_context(
File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context
raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type metabase: 'str' object is not callable
[2023-05-14 08:45:11,040] DEBUG {datahub.entrypoints:197} - DataHub CLI version: 0.10.2 at /tmp/datahub/ingest/venv-metabase-0.10.2/lib/python3.10/site-packages/datahub/__init__.py
[2023-05-14 08:45:11,040] DEBUG {datahub.entrypoints:200} - Python version: 3.10.10 (main, Mar 14 2023, 03:08:22) [GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-metabase-0.10.2/bin/python3 on Linux-5.15.49-linuxkit-aarch64-with-glibc2.31
[2023-05-14 08:45:11,040] DEBUG {datahub.entrypoints:205} - GMS config {'models': {}, 'patchCapable': True, 'versions': {'linkedin/datahub': {'version': 'v0.10.2', 'commit': '0fa983adc7370862371b4c0786aac0e3b81a563a'}}, 'managedIngestion': {'defaultCliVersion': '0.10.2', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'timeZone': 'GMT', 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
wide-ghost-47822
05/14/2023, 8:33 PMlog_ingestion_stats
in pipeline object. And I wondered if I can get some metrics about the pipeline which is runned.
I saw some code block inside this method which sends some statistics data using telemetry object. It is like this:
telemetry.telemetry_instance.ping(
"ingest_stats",
{
"source_type": self.config.source.type,
"sink_type": self.config.sink.type,
"records_written": stats.discretize(
self.sink.get_report().total_records_written
),
"source_failures": stats.discretize(source_failures),
"source_warnings": stats.discretize(source_warnings),
"sink_failures": stats.discretize(sink_failures),
"sink_warnings": stats.discretize(sink_warnings),
"global_warnings": global_warnings,
"failures": stats.discretize(source_failures + sink_failures),
"warnings": stats.discretize(
source_warnings + sink_warnings + global_warnings
),
},
Inside the ping method, the code sends this data to an external api called Mixpanel. It seems you are collecting data about the pipeline from my machine.
I don’t like this way of collecting data. Why are you collecting this data?colossal-waitress-83487
05/15/2023, 1:57 AMclever-author-65853
05/15/2023, 1:19 PMminiature-hair-20451
05/15/2023, 6:10 PMsilly-intern-25190
05/16/2023, 5:12 AM{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Cannot parse request entity\n'
'\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
'\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)',
'message': 'Cannot parse request entity',
'status': 400,
'id': 'urn:li:dataset:(urn:li:dataPlatform:vertica_fresh,public.test_data1,PROD)'}},
silly-nest-50341
05/16/2023, 5:41 AMdamp-orange-46267
05/16/2023, 9:50 AMPipelineInitError: Failed to find a registered source for type bigquery: 'str' object is not callable