high-gigabyte-86638
10/13/2022, 8:21 AMflaky-soccer-57765
10/13/2022, 8:52 AMaverage-dinner-25106
10/13/2022, 10:03 AMbrave-secretary-27487
10/13/2022, 12:16 PMwonderful-book-58712
10/13/2022, 2:29 PMwonderful-book-58712
10/13/2022, 2:30 PMastonishing-kite-41577
10/13/2022, 4:25 PMbland-balloon-48379
10/13/2022, 7:45 PM- name: dataset
doc: Datasets represent logical or physical data assets stored or represented in various data platforms. Tables, Views, Streams are all instances of datasets.
category: core
keyAspect: datasetKey
aspects:
- viewProperties
- subTypes
- datasetProfile
- datasetUsageStatistics
- operation
- domains
- schemaMetadata
- status
- container
- deprecation
- testResults
But looking in the PDL files and mysql I know dataset entities can also have an upstreamLineage aspect for example. I'm wondering why some aspects get listed in the entity-registry yaml and osters do not and why?
Thanks!average-dinner-25106
10/14/2022, 8:29 AMbreezy-portugal-43538
10/14/2022, 10:24 AMhigh-gigabyte-86638
10/14/2022, 11:21 AMagreeable-belgium-70840
10/14/2022, 11:59 AMINFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See <https://pip.pypa.io/warnings/backtracking> for guidance. If you want to abort this run, press Ctrl + C.
It's been always failing, I've tried some different versions. I am using a fresh docker image of python3.10-slim. Any ideas?agreeable-belgium-70840
10/14/2022, 1:53 PM"Failed to update urn:li:dataset:(urn:li:dataPlatform:dbt,iceberg.raw.eis_billingpremium_1_0,DEV) & field [version=2.0].[type=struct].[type=array].[type=struct].premium.[type=string].transaction_type. Field [version=2.0].[type=struct].[type=array].[type=struct].premium.[type=string].transaction_type does not exist in the datasets schema."
If I try to alter an element which is on the root level, it works just fine. Any ideas?white-ice-61578
10/14/2022, 5:03 PMHello guys,
I connected the superset to the Datahub to get the charts and the dashboards, but whenever I delete a chart in the superset, the Datahub still shows as if it was there even though it was deleted. Does anyone know how to solve?
clever-garden-23538
10/14/2022, 9:01 PMclever-garden-23538
10/14/2022, 10:27 PMmicroscopic-mechanic-13766
10/17/2022, 12:01 PM512 /var/volumes/kafka-broker/var/lib/kafka/data3/cleaner-offset-checkpoint
21K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_config-0
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-11
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-14
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-17
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-2
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-20
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-23
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-5
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_offset-8
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/connect_status-2
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-0
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-1
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-10
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-11
1.5M /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-12
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-13
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-14
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-15
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-16
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-17
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-18
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-19
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-2
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-20
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-21
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-22
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-23
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-24
4.5M /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-25
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-26
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-27
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-28
12K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-29
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-3
8.0M /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-30
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-31
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-32
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-33
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-34
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-35
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-36
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-37
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-38
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-39
1.6M /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-4
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-40
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-41
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-42
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-43
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-44
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-45
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-46
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-47
1.6M /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-48
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-49
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-5
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-6
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-7
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-8
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/__consumer_offsets-9
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/DataHubUsageEvent_v1-0
512 /var/volumes/kafka-broker/var/lib/kafka/data3/log-start-offset-checkpoint
1.4M /var/volumes/kafka-broker/var/lib/kafka/data3/MetadataAuditEvent_v4-0
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/MetadataChangeEvent_v4-0
583K /var/volumes/kafka-broker/var/lib/kafka/data3/MetadataChangeLog_Timeseries_v1-0
14G /var/volumes/kafka-broker/var/lib/kafka/data3/MetadataChangeLog_Versioned_v1-0
5.0K /var/volumes/kafka-broker/var/lib/kafka/data3/MetadataChangeProposal_v1-0
512 /var/volumes/kafka-broker/var/lib/kafka/data3/meta.properties
2.0K /var/volumes/kafka-broker/var/lib/kafka/data3/recovery-point-offset-checkpoint
2.0K /var/volumes/kafka-broker/var/lib/kafka/data3/replication-offset-checkpoint
493K /var/volumes/kafka-broker/var/lib/kafka/data3/_schemas-0
Is it normal to have so much info under the topic MetadataChangeLog_Versioned_v1
?
Can something be done to control the size of the topics or erase part of the information stored in them?
Note: Datahub is only used by me and no one else. This is the first time this has happened to me and I have been using Datahub for quite a few months (with small ingestions, with max of hundreds of rows)
The version that I am using is the 0.8.45lemon-yacht-62789
10/17/2022, 12:19 PMsource:
type: looker
config:
extract_owners: true
base_url: '<our url>'
skip_personal_folders: true
include_deleted: false
client_secret: '${LOOKER_CLIENT_SECRET}'
extract_usage_history: true
env: prod
platform: snowflake
client_id: '${LOOKER_CLIENT_ID}'
pipeline_name: 'urn:li:dataHubIngestionSource:<uuid here>'
For now I have explicitly set the CLI version on the source to 0.8.45 as a workaround - it then runs without issue.
The ERROR log entry as follows:
'[2022-10-15 05:03:37,869] ERROR {datahub.entrypoints:192} - \n'
'Traceback (most recent call last):\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/entrypoints.py", line 149, in main\n'
' sys.exit(datahub(standalone_mode=False, **kwargs))\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 1130, in __call__\n'
' return self.main(*args, **kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 1055, in main\n'
' rv = self.invoke(ctx)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
' return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n'
' return _process_result(sub_ctx.command.invoke(sub_ctx))\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 1404, in invoke\n'
' return ctx.invoke(self.callback, **ctx.params)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/core.py", line 760, in invoke\n'
' return __callback(*args, **kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func\n'
' return f(get_current_context(), *args, **kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper\n'
' raise e\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper\n'
' res = func(*args, **kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in '
'wrapper\n'
' return func(ctx, *args, **kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 212, in run\n'
' loop.run_until_complete(run_func_check_upgrade(pipeline))\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 166, in '
'run_func_check_upgrade\n'
' ret = await the_one_future\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 157, in run_pipeline_async\n'
' return await loop.run_in_executor(\n'
' File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n'
' result = self.fn(*self.args, **self.kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 148, in '
'run_pipeline_to_completion\n'
' raise e\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 134, in '
'run_pipeline_to_completion\n'
' pipeline.run()\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 348, in run\n'
' for wu in itertools.islice(\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 1164, '
'in get_workunits\n'
' ) = job.result()\n'
' File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result\n'
' return self.__get_result()\n'
' File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n'
' result = self.fn(*self.args, **self.kwargs)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 1049, '
'in process_dashboard\n'
' metric_dim_workunits = self.process_metrics_dimensions_and_fields_for_dashboard(\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 880, '
'in process_metrics_dimensions_and_fields_for_dashboard\n'
' chart_mcps = [\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 881, '
'in <listcomp>\n'
' self._make_metrics_dimensions_chart_mcp(element, dashboard)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 988, '
'in _make_metrics_dimensions_chart_mcp\n'
' fields=self._input_fields_from_dashboard_element(dashboard_element)\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_source.py", line 919, '
'in _input_fields_from_dashboard_element\n'
' explore = self.explore_registry.get_explore(\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_common.py", line 882, '
'in get_explore\n'
' looker_explore = LookerExplore.from_api(\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/looker_common.py", line 602, '
'in from_api\n'
' from datahub.ingestion.source.looker.lookml_source import _BASE_PROJECT_NAME\n'
' File "/tmp/datahub/ingest/venv-looker-0.9.0/lib/python3.10/site-packages/datahub/ingestion/source/looker/lookml_source.py", line 11, '
'in <module>\n'
' import lkml\n'
"ModuleNotFoundError: No module named 'lkml'\n"
'[2022-10-15 05:03:37,869] ERROR {datahub.entrypoints:195} - Command failed: \n'
"\tNo module named 'lkml'.\n"
microscopic-mechanic-13766
10/17/2022, 2:28 PMbland-balloon-48379
10/17/2022, 3:08 PMminiature-eve-21984
10/17/2022, 4:18 PMimportant-fireman-62760
10/17/2022, 11:20 PMadamant-van-21355
10/18/2022, 8:23 AMcolumn level lineage
functionality of the latest version (v0.9.0
) and we have an issue with the new lineage visualization there.
Context: We are ingesting data from Snowflake, DBT and Looker and we can see a cross-system lineage of the entities involved.
Once I enable the "Show Columns" function, I can see all columns appear but I am unable to select any of the columns for most of the tables involved. It seems that column lineage can not be identified for mixed DBT/Snowflake models in this case hence I don't see more column-related edges or can not select/lock any of these columns. Are these kind of mixed models involving DBT supported currently for this feature and what is the plan on DBT side for this?
Is there any related configuration that we are not taking into account? (i.e. start-time of lineage?)
Also, are there any more detailed docs available about the new column-level lineage functionality? Thanks 🙂high-gigabyte-86638
10/18/2022, 9:17 AMfierce-garage-74290
10/18/2022, 1:28 PMrough-activity-61346
10/19/2022, 1:22 AMhelm install prerequisites datahub/datahub-prerequisites
Autopilot set default resource requests for StatefulSet default/prerequisites-kafka, as resource requests were not specified. See <http://g.co/gke/autopilot-defaults.Error>: INSTALLATION FAILED: admission webhook "<http://gkepolicy.common-webhooks.networking.gke.io|gkepolicy.common-webhooks.networking.gke.io>" denied the request: GKE Policy Controller rejected the request because it violates one or more policies: {"[denied by autogke-disallow-privilege]":["container configure-sysctl is privileged; not allowed in Autopilot"]}
loud-camera-71352
10/19/2022, 3:26 PMminiature-plastic-94007
10/19/2022, 4:11 PMextraEnvs:
- name: AUTH_OIDC_ENABLED
value: "true"
- name: AUTH_OIDC_CLIENT_ID
value: "<redacted>"
- name: AUTH_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: datahub-oidc-secret
key: datahub-oidc-key
- name: AUTH_OIDC_DISCOVERY_URI
value: <identity_provider_host>/.well-known/openid-configuration/
- name: AUTH_OIDC_BASE_URL
value: <my_datahub_host>
- name: AUTH_OIDC_SCOPE
value: "openid profile email"
- name: AUTH_OIDC_USER_NAME_CLAIM
value: "email"
- name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
value: "([^@]+)"
All pods are fine, but when I try to access the Datahub frontend using the host configured into Ingress, I receive an internal server error (500). When I check the logs from datahub-frontend pod, I found a connection reset error. 😕
Reading the documentation, I found this configuration here: https://datahubproject.io/docs/deploy/aws/#expose-endpoints-using-a-load-balancer
But, my Kubernetes cluster already have an ingress controller configured. My question is: this step (expose endpoint using a load balancer) is mandatory or I’m able to reuse the configured ingress controller from my Kubernetes cluster?
Thanks for help!average-dinner-25106
10/20/2022, 4:05 AMbetter-actor-97450
10/20/2022, 7:33 AM