prehistoric-optician-40107
03/22/2022, 11:10 AM"ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by "
"NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb58e5d14f0>: Failed to establish a new connection: [Errno 111] "
"Connection refused'))\n",
"2022-03-16 11:30:30.325290 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Failed to execute 'datahub ingest'",
'2022-03-16 11:30:30.325765 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Caught exception EXECUTING '
'task_id=d287226a-592b-4029-879a-583a3cfa64eb, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
white-postman-45591
03/22/2022, 12:42 PMbland-crowd-77263
03/23/2022, 2:47 AMred-napkin-59945
03/23/2022, 5:55 AMproperties
however, the frontend code(group.graphql) does not request properties
field. Is this some bug?full-dentist-68591
03/23/2022, 8:36 AMSchemaFieldDataTypeClass
in SchemaFieldClass
when creating a dataset?
I have a xml export of an ETL job and trying to ingest table defintions into DataHub from this file. In order to select the right data types for the columns I need some sort of mapping (e.g. VARCHAR -> StringTypeClass).polite-orange-57255
03/23/2022, 9:54 AMrich-policeman-92383
03/23/2022, 10:47 AMmodern-artist-55754
03/23/2022, 1:01 PMsource produced an invalid metadata work unit: MetadataChangeEventClass
I'm trying to ingest some workbooks from tableau, and there's one particular one that keeps failing but i wasn't sure why.brave-secretary-27487
03/23/2022, 2:53 PM[2022-03-23, 14:43:03 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627.page_views,PROD)
[2022-03-23, 14:43:03 UTC] {pipeline.py:92} ERROR - failed to write record with workunit dw.analytics_245627.page_views with Expecting value: line 1 column 1 (char 0) and info {}
[2022-03-23, 14:43:03 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627038.sessions,PROD)
[2022-03-23, 14:43:03 UTC] {pipeline.py:92} ERROR - failed to write record with workunit dw.analytics_245627038.sessions with Expecting value: line 1 column 1 (char 0) and info {}
[2022-03-23, 14:43:04 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627038.user_detail_events,PROD)
[2022-03-23, 14:43:04 UTC] {pipeline.py:84} INFO - sink wrote workunit dw.analytics_245627038.user_detail_events
In this log it's visible that it sometimes error out. How would I best approach this to debug this issue and what could be the reason that some sinks fail?red-napkin-59945
03/23/2022, 7:08 PMAnalytics
button. Do we need to do some special configuration in order to use the feature?
AnalyticsService:264 - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
chilly-oil-22683
03/23/2022, 8:22 PMError: UPGRADE FAILED: failed to create resource: Deployment.apps "datahub-datahub-frontend" is invalid: spec.template.spec.containers[0].env[11].valueFrom.secretKeyRef.key: Required value helm.go:84: [debug] Deployment.apps "datahub-datahub-frontend" is invalid: spec.template.spec.containers[0].env[11].valueFrom.secretKeyRef.key: Required value
What kind of value is it looking for? Is it some helm setting, but I can't seem to find that setting in the helm chart settings: https://artifacthub.io/packages/helm/datahub/datahub
Is it looking for EKS cluster settings? Does anyone have a pointer for me, where should I set this setting and what value is it looking for?
Thanks!
Dennisbreezy-portugal-43538
03/24/2022, 9:27 AMpip install 'acryl-datahub[great-expectations]'
When running the checkpoint yml file there is an error prompted with missing module:
FileNotFoundError: No module named "datahub.integrations.great_expectations.action" could be found in the repository. Please make sure that the file, corresponding to this package and module, exists and that dynamic loading of code modules, templates, and assets is supported in your execution environment. This error is unrecoverable.
When I ran my IDE I see that during the import the integrations
module is not present, is it some bug occurring ubuntu?
Could you help to resolve the issue?
I am posting pictures below from the windows and ubuntu, if any more information would be required please let me know.gentle-father-80172
03/24/2022, 2:15 PMquick-student-61408
03/24/2022, 2:34 PMapache@apache-VirtualBox:~$ python3.9 -m datahub ingest -c business_glossary.yml
[2022-03-24 15:32:13,017] INFO {datahub.cli.ingest_cli:75} - DataHub CLI version: 0.8.31.2
[2022-03-24 15:32:13,164] ERROR {datahub.entrypoints:152} - File "/home/apache/.local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 82, in run
70 def run(
71 ctx: click.Context, config: str, dry_run: bool, preview: bool, strict_warnings: bool
72 ) -> None:
(...)
78 pipeline_config = load_config_file(config_file)
79
80 try:
81 logger.debug(f"Using config: {pipeline_config}")
--> 82 pipeline = Pipeline.create(pipeline_config, dry_run, preview)
83 except ValidationError as e:
File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 174, in create
170 @classmethod
171 def create(
172 cls, config_dict: dict, dry_run: bool = False, preview_mode: bool = False
173 ) -> "Pipeline":
--> 174 config = PipelineConfig.parse_obj(config_dict)
175 return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "pydantic/main.py", line 511, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 329, in pydantic.main.BaseModel.__init__
File "pydantic/main.py", line 1022, in pydantic.main.validate_model
File "pydantic/fields.py", line 837, in pydantic.fields.ModelField.validate
File "pydantic/fields.py", line 1118, in pydantic.fields.ModelField._apply_validators
File "pydantic/class_validators.py", line 278, in pydantic.class_validators._generic_validator_cls.lambda2
File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 56, in run_id_should_be_semantic
52 def run_id_should_be_semantic(
53 cls, v: Optional[str], values: Dict[str, Any], **kwargs: Any
54 ) -> str:
55 if v == "__DEFAULT_RUN_ID":
--> 56 if values["source"] is not None:
57 if values["source"].type is not None:
KeyError: 'source'
[2022-03-24 15:32:13,165] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.2 at /home/apache/.local/lib/python3.9/site-packages/datahub/__init__.py
[2022-03-24 15:32:13,165] INFO {datahub.entrypoints:164} - Python version: 3.9.11 (main, Mar 16 2022, 17:19:28)
[GCC 9.4.0] at /usr/bin/python3.9 on Linux-5.13.0-35-generic-x86_64-with-glibc2.31
[2022-03-24 15:32:13,165] INFO {datahub.entrypoints:167} - GMS config {}
quick-student-61408
03/24/2022, 2:35 PMcalm-television-89033
03/24/2022, 3:56 PMsource:
type: bigquery
config:
project_id: '${DATAPLATFORM_PROJECT_ID}'
credential:
project_id: '${DATAPLATFORM_PROJECT_ID}'
private_key_id: '${BIGQUERY_PRIVATE_KEY_ID}'
private_key: '${BIGQUERY_PRIVATE_KEY}'
client_email: '${BIGQUERY_CLIENT_EMAIL}'
client_id: '${BIGQUERY_CLIENT_ID}'
sink:
type: datahub-rest
config:
server: '<http://30.222.164.39:8080>'
And I'm getting the following error:
"Failed to resolve secret with name DATAPLATFORM_PROJECT_ID. Aborting recipe execution."
I double-checked the secrets names as sugested by the UI Ingestion Guide and they are correct.
Have you guys gone through this or could you give me any tips on how to proceed? Thanks in advance for your attention! 🙂gentle-camera-33498
03/24/2022, 5:35 PM---- (full traceback above) ----
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/pbraz/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 202, in wrapper
raise e
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 194, in wrapper
res = func(*args, **kwargs)
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
res = func(*args, **kwargs)
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 92, in run
pipeline.run()
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 181, in run
for wu in itertools.islice(
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 541, in get_workunits
yield from self.emit_card_mces()
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 240, in emit_card_mces
chart_snapshot = self.construct_card_from_api_data(card_info)
File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 258, in construct_card_from_api_data
card_response = self.session.get(card_url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')"))
[2022-03-24 17:02:52,053] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.1 at /home/pbraz/.local/lib/python3.8/site-packages/datahub/__init__.py
[2022-03-24 17:02:52,053] INFO {datahub.entrypoints:164} - Python version: 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0] at /usr/bin/python3 on Linux-5.13.0-1019-gcp-x86_64-with-glibc2.29
[2022-03-24 17:02:52,053] INFO {datahub.entrypoints:167} - GMS config {}
I made some searches trying to discover the reason for these errors. My first guess was the API request rate limit, but I found in the documentation that just login requests have rate limits (see here). My second try was to search for the error on the internet and I found not so similar situation but with the same error (see here). It could be possible that Metabase has a security control for User-Agent headers?
The user created for this POC is receiving this email every time I try to ingest the metadata from Metabase:
Does someone have some idea what I could probably be doing wrong?
Thanks for your attention!red-napkin-59945
03/24/2022, 8:59 PMDATAHUB_ANALYTICS_ENABLED
to true
, the UI pages load time is extremely long (10s+), any idea about this long time? I did not find any error log in both datahub-frontend and datahub-gmsmysterious-portugal-30527
03/24/2022, 9:51 PMnumerous-table-92385
03/25/2022, 5:50 AMmodern-monitor-81461
03/25/2022, 2:17 PMView in Airflow
button in my deployed Datahub, it opens something else (seems like it is reloading the current page I'm on). When I look at the navigation URL of the button (href
), I see something like:
<https://datahub.mydomain.com/tasks/urn:li:dataJob:(urn:li:dataFlow:(airflow,xxxxx,prod),xxxxx)/airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx>
where <http://airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx|airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx>
is a valid URL (I can open it up in my browser and what I get is what I expected).
So the href
is like https://`<datahub domain>`/`<params and urn of the airflow task>`/`<airflow domain>`/`<params of the airflow task>`
I also have a Superset integration and when I look at the href
of the View in Superset
button, I see a real Superset URL without a datahub URL prepended. That button works well.
Why is the Airflow button href different?red-napkin-59945
03/25/2022, 8:10 PMTrackingController
in datahub-frontend, any reason we want to flush here?most-room-32003
03/26/2022, 10:13 PMThe field at path '/dataset/assertions/assertions[0]' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Assertion' within parent type '[Assertion!]' (code undefined)
wooden-football-7175
03/28/2022, 1:08 PMairflow
and GreatExpectationsOperator
that maybe have sense to bring it here.
When trying to install acryl dependencies to run GE action to send validations to Datahub, this “compatibility issue” appears.numerous-camera-74294
03/28/2022, 1:38 PMbitter-toddler-42943
03/29/2022, 2:41 AMgorgeous-dinner-4055
03/29/2022, 6:01 AMtrace
call is hanging(should timeout in 60 seconds and you can make one more call).
If it is, you may also notice the following stack trace if you add a callback to the event tracking kafka emit:
org.apache.kafka.common.errors.TimeoutException: Topic DataHubUsageEvent_v1 not present in metadata after 60000 ms.
To fix this, the following configs need to be set to talk to kafka correctly:
https://github.com/datahub-project/datahub/blob/34b36c0fe17f6ed6195ba5a0b57f41853fc60532/datahub-frontend/conf/application.conf#L158
Will update the docs tomorrow to add this info, but hopefully someone will find this useful if doing a search through slack one day 🙂boundless-student-48844
03/29/2022, 1:57 PM./docker/dev-without-neo4j.sh
, I do a first ingestion using datahub ingest -c ~/Desktop/recipe1.yml
(it ingests from json file with MCEs for Dataset), and then do the second ingestion using datahub ingest -c ~/Desktop/recipe2.yml
(it ingests from json file with MCEs for Dashboard),
The getSearchResultsForMultiple
gql query with query variable “_types_” set to the first entity type (in this case, Dataset) return expected results. However, it returns no result when “_types_” is set to the second entity type (in this case, Dashboard). Supposedly, there should be some entities returned as Dashboard entities are ingested in the second run. I checked the ES, the entities exist in the dashboardindex_v2
index.
If I do the reverse, ingest Dashboards first then Datasets, I am not able to get Datasets from getSearchResultsForMultiple
. Does this have anything to do with Elasticsearch cache?polite-orange-57255
03/30/2022, 9:07 AMgray-agency-10420
03/30/2022, 9:28 AMdim_geo_location_processed/version=20220312T000000/dim_geo_location_csv
dim_geo_location_processed/version=20220313T000000/dim_geo_location_csv
we expect to have one dim_geo_location_processed/dim_geo_location_csv