DataHub #troubleshoot

most-airplane-91939

10/12/2021, 7:05 PM

Hello! I'm new to DataHub. The installation was quick and easy. But the command "datahub docker ingest-sample-data! throws errors. It seems like a bug that was already fixed:

stale-jewelry-2440

10/13/2021, 8:11 AM

Hi dear community! I see the folder

metadata

datahub/metadata-ingestion/src/*datahub*/

has been deleted, but modules in there are still used in the ingestion process. As example in:

Copy code

metadata-ingestion/src/datahub/ingestion/extractor/schema_util.py:from datahub.metadata.com.linkedin.pegasus2avro.schema import (
metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.schema_classes import UsageAggregationClass
metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.schema_classes import UsageAggregationClass
metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.schemas import (

Is this a partial code refactoring, or should we restore that folder?

stale-printer-44316

10/13/2021, 1:43 PM

In DataHub in terms of updating the Data Ownership I understand that we need to integrate with LDAP. However, we don't want to use LDAP but ingest data owner name as a field via csv etc. Is that possible? If so how can we implement please?

gentle-father-80172

10/13/2021, 3:05 PM

Good morning! Can someone please help decipher this error from Datahub GMS? Thanks!

Copy code

DataHubGraphQLError{path=[dataset, upstreamLineage, entities, 3, entity, downstreamLineage, entities, 0, entity], code=SERVER_ERROR, locations=[SourceLocation{line=743, column=11}]}

@big-carpet-38439 - This might be related to the issue I was seeing last week.

handsome-belgium-11927

10/14/2021, 2:33 PM

It is not working via curl either. Anybody tested ingestion of description on the last version of Datahub? Tried this curl (properties ingested, but not the description):

Copy code

curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
	"entity": {
		"value": {
			"com.linkedin.metadata.snapshot.DatasetSnapshot": {
				"urn": "urn:li:dataset:(urn:li:dataPlatform:exasol,main.dds.test2,PROD)",
				"aspects": [
					{
						"com.linkedin.dataset.DatasetProperties": {
							"description": "Hello", 
							"customProperties": {
								"hello": "world"
							}
						}
					}
				]
			}
		}
	}
}'

broad-crowd-13788

10/15/2021, 1:43 PM

Copy code

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
Exception in thread "main" java.lang.RuntimeException: Failed to validate DataHub PDL models
        at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:50)
Caused by: com.linkedin.metadata.models.ModelValidationException: Found invalid relationship with name OwnedBy at path /orgOwner. Invalid entityType(s) provided.
        at com.linkedin.metadata.models.EntitySpecBuilder.failValidation(EntitySpecBuilder.java:323)
        at com.linkedin.metadata.models.EntitySpecBuilder.buildEntitySpecs(EntitySpecBuilder.java:74)
        at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:48)

Not sure why the validation task fails. We are trying to extend the model and use the 'OwnedBy' relationship between other entities but the build fails with the above error. Is there a specific rule regarding 'OwnedBy' relationship that we are missing?

blue-zoo-89533

10/16/2021, 4:58 PM

Hello, I'm unable to make metadata ingestion for BigQuery, the command works well without an error message but I don't have the data in the Datahub front end

crooked-midnight-76614

10/18/2021, 8:21 AM

How do I set/override the name/group id of "generic-mce-consumer-job-client" and "generic-mae-consumer-job-client" when using the DataHub Helm Charts? I'm trying to enable the analytics feature but I'm getting the following error as Kafka group ids have to be prefixed in my organization:

07:51:34.891 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Authorization Exception

org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-mce-consumer-job-client

Which I think in turn is causing:

07:51:34.903 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Fatal consumer exception; stopping container

And perhaps finally causing the following error when trying to access the analytics tab:

Copy code

08:17:15.790 [Thread-13[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
08:17:15.792 [Thread-13[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:93)
    at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:59)
    at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:39)
    at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:27)
    at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
    at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
    at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
    at graphql.execution.Execution.executeOperation(Execution.java:165)
    at graphql.execution.Execution.execute(Execution.java:104)
    at graphql.GraphQL.execute(GraphQL.java:557)
    at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
    at graphql.GraphQL.executeAsync(GraphQL.java:446)
    at graphql.GraphQL.execute(GraphQL.java:377)
    at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
    at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
    at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
    ... 17 common frames omitted
    Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
        at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
        ... 21 common frames omitted
08:17:15.798 [Thread-15[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
08:17:15.802 [Thread-13[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    title\n    charts {\n      ... on TimeSeriesChart {\n        title\n        lines {\n          name\n          data {\n            x\n            y\n            __typename\n          }\n          __typename\n        }\n        dateRange {\n          start\n          end\n          __typename\n        }\n        interval\n        __typename\n      }\n      ... on BarChart {\n        title\n        bars {\n          name\n          segments {\n            label\n            value\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      ... on TableChart {\n        title\n        columns\n        rows {\n          values\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
08:17:15.799 [Thread-15[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
java.lang.RuntimeException: Search query failed:
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:216)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:50)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:29)
    at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:19)
    at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
    at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
    at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
    at graphql.execution.Execution.executeOperation(Execution.java:165)
    at graphql.execution.Execution.execute(Execution.java:104)
    at graphql.GraphQL.execute(GraphQL.java:557)
    at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
    at graphql.GraphQL.executeAsync(GraphQL.java:446)
    at graphql.GraphQL.execute(GraphQL.java:377)
    at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
    at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
    at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
    at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
    ... 17 common frames omitted
    Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
        at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
        ... 21 common frames omitted
08:17:15.807 [Thread-15[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getHighlights {\n  getHighlights {\n    value\n    title\n    body\n    __typename\n  }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getHighlights[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getHighlights[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]

I've tried setting the following environment variables:

DATAHUB_USAGE_EVENT_NAME

DATAHUB_USAGE_EVENT_KAFKA_CONSUMER_GROUP_ID

DATAHUB_TRACKING_TOPIC

⬆️ 1

some-hospital-42166

10/18/2021, 8:45 AM

083839.181 [ForkJoinPool.commonPool-worker-5] ERROR c.l.d.g.r.search.SearchResolver - Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 com.linkedin.data.template.RequiredFieldNotPresentException: Field "value" is required but it is not present 083839.181 [ForkJoinPool.commonPool-worker-5] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.RuntimeException: Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:82) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 5 common frames omitted

brief-lizard-77958

10/18/2021, 1:29 PM

Using freshly pulled code from github, I cannot complete a gradlew build. It fails on metadata-ingestion:lint. The error output is: https://pastebin.com/nkxwVkp1, --debug out: https://pastebin.com/gDD7w2M3 It seems to trip on this part:

[system.out] venv/lib/python3.9/site-packages/airflow/_vendor/connexion/spec.py:169: error: invalid syntax

See thread for the picture of where this points to. I could build previous versions without a problem with the same system (Ubuntu 21.04)

✅ 1

witty-butcher-82399

10/18/2021, 2:15 PM

I’ve been testing snowflake connector. While it worked with basic usage, I got some issues with advanced features. Sharing here for discussion before raising any issue in github. • When enabling the

profile

feature, I got the following error:

Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.

Have you got this error before? Is there any way I could set which is the schema to be used for the profiler when creating the temporary tables? • When using the

snowflake-usage

connector, I got the error below.

Copy code

"Failed to parse usage line {'query_start_time': datetime.datetime(2021, 10, 17, 3, 41, 14, 560000, "
                        'tzinfo=datetime.timezone.utc), \'query_text\': "create temporary table '
                        'avalanche.dwh_stage_iad.accepted_values_stg_ad_images_source_system__IAD__tmp as\\n      select '
                        'test_run_start_ts,\\n            row_count,\\n            failure_row_count,\\n              case when failure_row_count > '
                        "0\\n                  then 'ERROR'\\n                  else 'PASS'\\n            end as test_status\\n      from "
                        '(\\n        select current_timestamp as test_run_start_ts,\\n                count_all_sql.row_count,\\n                '
                        'case when count_all_sql.row_count > 0 \\n                      then (\\n                              -- begin of data test '
                        'query\\n                              select count(1) as row_count\\n      from  avalanche.dwh_stage_iad.stg_ad_images as '
                        "model\\n      \\n        where (\\n                source_system not in ('IAD')\\n              "
                        ')\\n                              -- and of data test query\\n                          )\\n                      else '
                        '0\\n                end as failure_row_count\\n          from (\\n                  -- begin of count all '
                        'query\\n                select count(1) as row_count\\n      from  avalanche.dwh_stage_iad.stg_ad_images as model\\n      '
                        '\\n                  -- end of count all query\\n              ) as count_all_sql\\n      );", \'query_type\': '
                        "'CREATE_TABLE_AS_SELECT', 'base_objects_accessed': [{'columns': [{'columnId': 82388016, 'columnName': 'SOURCE_SYSTEM'}], "
                        "'objectDomain': 'Table', 'objectId': 19718154, 'objectName': 'AVALANCHE.DWH_STAGE_IAD.STG_AD_IMAGES'}], 'user_name': "
                        "'SERVICE_AVALANCHE', 'first_name': 'Avalanche', 'last_name': 'Service Account', 'display_name': 'SERVICE_AVALANCHE', "
                        "'email': None, 'role_name': 'SERVICE_DBT'}",

• Also, regarding the

snowflake-usage

connector, it called my attention that it is handled as an independent connector instead of just a property on the

snowflake

connector. Because of that, while I can filter (`allow`/`deny`) tables and schemas with the

snowflake

connector, I can’t with the

snowflake-usage

one. This results in

snowflake-usage

producing events for tables that I don’t want to be in the catalog. Any reason why this split of the connector? Or how can I keep both connectors aligned on which tables being processed? Thanks in advance!

witty-actor-87329

10/18/2021, 6:28 PM

Hi Team, Trying to configure okta authentication from react by following this guide. But getting below error:

Copy code

Caused by: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value
	at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1378)
	at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1312)
	at org.pac4j.oidc.redirect.OidcRedirectActionBuilder.buildAuthenticationRequestUrl(OidcRedirectActionBuilder.java:110)
	... 49 common frames omitted

Below is my configurations:

Copy code

- AUTH_OIDC_ENABLED=true
    - AUTH_OIDC_CLIENT_ID=xxxxxxxxxxxxxxx
    - AUTH_OIDC_CLIENT_SECRET=zzzzzzzzzzzzzzzz
    - AUTH_OIDC_DISCOVERY_URI=<https://xyz.okta.com/.well-known/openid-configuration>
    - AUTH_OIDC_BASE_URL=<https://datahub-prod.xyz.io>
    - AUTH_OIDC_SCOPE="openid profile email groups"

Can anyone help on this. cc: @gentle-father-80172

red-pizza-28006

10/19/2021, 6:52 PM

Hi everyone, when I setup datahub on a kubernetes cluster, I see that all pods are running fine but clicking on datasets in the UI, results in this error in the frontend pod. Any idea what might be causing this?

handsome-football-66174

10/19/2021, 7:13 PM

Hi everyone, Trying to set up OIDC using the below configuration

Copy code

extraEnvs:
  - name: AUTH_OIDC_ENABLED
    value: "true"
  - name: AUTH_OIDC_CLIENT_ID
    value: MMSOauthClient
  - name: AUTH_OIDC_CLIENT_SECRET
    value: "<value>"
  - name: AUTH_OIDC_DISCOVERY_URI
    value: https://<saml-host>/.well-known/openid-configuration
  - name: AUTH_OIDC_BASE_URL
    value: https://<host>/

It is redirecting to this <https://<hostname>/#error_description=The+global+default+access+token+manager+is+not+available+for+the+selected+client+and+authentication+context&error=invalid_request|https://<hostname>/#error_description=The+global+default+[…]ication+context&error=invalid_request>

microscopic-elephant-47912

10/19/2021, 8:29 PM

Hello, I'm trying to ingest looker metadata but I'm having some problems. 1 week ago I was able to ingest with the same config file.

Copy code

---- (full traceback above) ----
File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 91, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 52, in run
    pipeline = Pipeline.create(pipeline_config)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 120, in create
    return cls(config)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 88, in __init__
    self.source: Source = source_class.create(
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 788, in create
    return cls(config, ctx)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 245, in __init__
    self.client = LookerAPI(self.source_config).get_client()
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 84, in __init__
    raise ConfigurationError(

ConfigurationError: Failed to initialize Looker client. Please check your configuration.

Slack Conversation

modern-nail-74015

10/20/2021, 1:55 AM

Can I use my own ODIC backend？

high-notebook-40979

10/20/2021, 4:46 AM

hi all I'm try to access Manage user and group page throw exception

modern-nail-74015

10/20/2021, 8:10 AM

Copy code

Caused by: java.lang.RuntimeException: Failed to resolve user name claim from profile provided by Identity Provider. Missing attribute. Attribute: 'preferred_username', Regex: '(.*)', Profile: {at_hash=froRaU5vpSNyY0aMxVEXmw, token_expiration_advance=-1, aud=[WCca1QCzMPQ6HDgOthv0UvB6WtuMUjHC], id_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ0ZWxldHJhYW4iLCJzdWIiOiIxIiwiYXVkIjoiV0NjYTFRQ3pNUFE2SERnT3RodjBVdkI2V3R1TVVqSEMiLCJleHAiOjE2MzQ3MjQyOTgsImlhdCI6MTYzNDcxNzA5OCwidXNlcm5hbWUiOiJydWljb3JlIiwiYXRfaGFzaCI6ImZyb1JhVTV2cFNOeVkwYU14VkVYbXcifQ.EQeE8S12nrwIj5FeoJDcAOXT3nzcQCrdpyrmBVXJmTs, iss=teletraan, exp=Wed Oct 20 10:04:58 GMT 2021, iat=Wed Oct 20 08:04:58 GMT 2021, username=ruicore}

tall-forest-65335

10/20/2021, 10:44 AM

I’m looking at a problem in Datahub where once I load datasets, I can find them if I search but not by browsing through the UI (the React UI if it matters). Does anyone have any suggestions on where I might have problems e.g. which component(s) are responsible for creating that directory, but not a direct lookup? This is on Datahub v0.8.14.

red-pizza-28006

10/20/2021, 11:46 AM

i have a question around using our custom Kafka for Datahub. I couldn’t find anything in the helm chart to provide a username/password for using kafka. Same thing for schema registry as I want to supply a truststore file and password

melodic-helmet-78607

10/21/2021, 6:08 AM

I have a question regarding rollback behavior, does rollback not deleting existing graph/relationship? 1. I list all run id in console & rollback all runs manually 2. Ingest a dataset schema 3. Past lineage is somehow appeared

modern-nail-74015

10/21/2021, 8:25 AM

I want to revoke a token, but this error occurred

nice-planet-17111

10/21/2021, 8:26 AM

Hello 🙂 l'm tyring to use ClouSQL as datahub storage.. I was trying to connect via CloudSQL auth proxy (i edited

datahub-gms

and

datahub-upgrade-job

deployment, and when i helm update, upgrade job fails and i get

Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'

error 😞 Is there something i can do? (error log and helm chart is in the thread because it's too long)

✅ 1

red-pizza-28006

10/21/2021, 6:37 PM

anyone can share an example where they have used their own Kafka? I am seeing if I can use our own Kafka but use the k8 zookeeper and schema registry

dry-policeman-74195

10/22/2021, 4:23 PM

Working on the quickstart.... I already have docker desktop installed on my windows computer and it is running. I've run steps to install datahub in my python environment. When I try and run "datahub docker quickstart" I get the following error. Any thoughts on what is wrong here? Thanks!

curved-jordan-15657

10/24/2021, 8:49 PM

Hello team, today we had an issue while ingesting redshift. The problem is, we were using the “host” field in recipe.yml file as host: <endpoint>/<db> name since today. But now, we had an error like:

Copy code

File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
    args.func(args, dag=self.dag)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
    _run_raw_task(args, ti)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
    ti._run_raw_task(
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1324, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1443, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1494, in _execute_task
    result = execute_callable(context=context)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 151, in execute
    return_value = self.execute_callable()
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 162, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/airflow/airflow/dags/datahub_ingestion_dag.py", line 39, in datahub_redshift
    pipeline.run()
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
    for wu in self.source.get_workunits():
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 253, in get_workunits
    lineage_mcp = self.get_lineage_mcp(wu.metadata.proposedSnapshot.urn)
  File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 272, in get_lineage_mcp
    tablename = dataset_params[2]
IndexError: list index out of range

After digging the problem, i realized that in redshift-py file, “dataset_params” array includes 3 parts inside by

dataset_params =dataset_key.name.split(".")

. And dataset_key doesn’t recognize our db-name from our recipe file since we wrote `host: <endpoint>/<db-name>`If i give the database name like

database:<db-name>

in recipe.yml file, it resolves this problem but then i had another error like:

Copy code

File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 58, in run
  pipeline.run()
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
  for wu in self.source.get_workunits():
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/redshift.py", line 248, in get_workunits
  for wu in super().get_workunits():
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 364, in get_workunits
  yield from self.loop_tables(inspector, schema, sql_config)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 435, in loop_tables
  columns = inspector.get_columns(table, schema)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
  col_defs = self.dialect.get_columns(
File "<string>", line 2, in get_columns
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 52, in cache
  ret = fn(self, con, *args, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 454, in get_columns
  cols = self._get_redshift_columns(connection, table_name, schema, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 705, in _get_redshift_columns
  return all_columns[key]

KeyError: '<our-schema-name>.<our-table-name>'

Everything was perfect until now. Datahub version is 0.8.16. I think there is with I need a solution for the first problem.

fresh-carpet-31048

10/25/2021, 2:05 PM

noob java question: is the difference between

DataFetcher<CompletableFuture<String>>

and

DataFetcher<CompletableFuture<Boolean>>

the return type (the first returns a string and second returns a boolean)?

handsome-football-66174

10/25/2021, 8:50 PM

General - Getting this error - Frontend - play.api.UnexpectedException: Unexpected exception[CompletionException: akka.http.scaladsl.model.EntityStreamException: Entity stream truncation] at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247) at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:176) at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:363) at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:361) at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346) at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:92) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) GMS - Suppressed: org.elasticsearch.common.ParsingException: Failed to parse object: expecting field with name [error] but found [Message] at org.elasticsearch.common.xcontent.XContentParserUtils.ensureFieldName(XContentParserUtils.java:50) at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:592) at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869) ... 19 common frames omitted Caused by: org.elasticsearch.client.ResponseException: method [POST], host [https://es.us-east-1.es.amazonaws.com:443], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Forbidden] {"Message":"User: anonymous is not authorized to perform: es:ESHttpPost"} at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302) at org.elasticsearch.client.RestClient.access$1700(RestClient.java:100) at org.elasticsearch.client.RestClient$1.completed(RestClient.java:350) ... 16 common frames omitted

future-hamburger-62563

10/26/2021, 1:19 AM

Finding some challenges trying to run

docker/dev.sh

, when I try and build I get this error

Copy code

org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=<http://docker.elastic.co/elasticsearch/elasticsearch:7.9.3|docker.elastic.co/elasticsearch/elasticsearch:7.9.3>, imagePullPolicy=DefaultPullPolicy())
	at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1286)
	at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:615)

* What went wrong: Execution failed for task 'metadata iotest'. I tried running

docker container prune

and the command cleared some disk space, but the build is still failing. Any ideas?

ancient-hair-10877

10/26/2021, 10:59 AM

Hi team, I have installed datahub but frontend only show dataset, Dashboard, charts and pipeline, ML models, glossary term, feature table have been hiden. Using uri /browse/pipelines, browse/mlModels,... still work