https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • m

    most-airplane-91939

    10/12/2021, 7:05 PM
    Hello! I'm new to DataHub. The installation was quick and easy. But the command "datahub docker ingest-sample-data! throws errors. It seems like a bug that was already fixed:
    m
    • 2
    • 3
  • s

    stale-jewelry-2440

    10/13/2021, 8:11 AM
    Hi dear community! I see the folder
    metadata
    in
    datahub/metadata-ingestion/src/*datahub*/
    has been deleted, but modules in there are still used in the ingestion process. As example in:
    Copy code
    metadata-ingestion/src/datahub/ingestion/extractor/schema_util.py:from datahub.metadata.com.linkedin.pegasus2avro.schema import (
    metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
    metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py:from datahub.metadata.schema_classes import UsageAggregationClass
    metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
    metadata-ingestion/src/datahub/ingestion/api/workunit.py:from datahub.metadata.schema_classes import UsageAggregationClass
    metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.com.linkedin.pegasus2avro.mxe import (
    metadata-ingestion/src/datahub/emitter/kafka_emitter.py:from datahub.metadata.schemas import (
    Is this a partial code refactoring, or should we restore that folder?
    s
    m
    • 3
    • 7
  • s

    stale-printer-44316

    10/13/2021, 1:43 PM
    In DataHub in terms of updating the Data Ownership I understand that we need to integrate with LDAP. However, we don't want to use LDAP but ingest data owner name as a field via csv etc. Is that possible? If so how can we implement please?
    b
    b
    • 3
    • 2
  • g

    gentle-father-80172

    10/13/2021, 3:05 PM
    Good morning! Can someone please help decipher this error from Datahub GMS? Thanks!
    Copy code
    DataHubGraphQLError{path=[dataset, upstreamLineage, entities, 3, entity, downstreamLineage, entities, 0, entity], code=SERVER_ERROR, locations=[SourceLocation{line=743, column=11}]}
    @big-carpet-38439 - This might be related to the issue I was seeing last week.
    b
    a
    +2
    • 5
    • 9
  • h

    handsome-belgium-11927

    10/14/2021, 2:33 PM
    It is not working via curl either. Anybody tested ingestion of description on the last version of Datahub? Tried this curl (properties ingested, but not the description):
    Copy code
    curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
    	"entity": {
    		"value": {
    			"com.linkedin.metadata.snapshot.DatasetSnapshot": {
    				"urn": "urn:li:dataset:(urn:li:dataPlatform:exasol,main.dds.test2,PROD)",
    				"aspects": [
    					{
    						"com.linkedin.dataset.DatasetProperties": {
    							"description": "Hello", 
    							"customProperties": {
    								"hello": "world"
    							}
    						}
    					}
    				]
    			}
    		}
    	}
    }'
    m
    b
    • 3
    • 14
  • b

    broad-crowd-13788

    10/15/2021, 1:43 PM
    Copy code
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
    Exception in thread "main" java.lang.RuntimeException: Failed to validate DataHub PDL models
            at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:50)
    Caused by: com.linkedin.metadata.models.ModelValidationException: Found invalid relationship with name OwnedBy at path /orgOwner. Invalid entityType(s) provided.
            at com.linkedin.metadata.models.EntitySpecBuilder.failValidation(EntitySpecBuilder.java:323)
            at com.linkedin.metadata.models.EntitySpecBuilder.buildEntitySpecs(EntitySpecBuilder.java:74)
            at com.linkedin.metadata.model.validation.ModelValidationTask.main(ModelValidationTask.java:48)
    Not sure why the validation task fails. We are trying to extend the model and use the 'OwnedBy' relationship between other entities but the build fails with the above error. Is there a specific rule regarding 'OwnedBy' relationship that we are missing?
    e
    b
    • 3
    • 13
  • b

    blue-zoo-89533

    10/16/2021, 4:58 PM
    Hello, I'm unable to make metadata ingestion for BigQuery, the command works well without an error message but I don't have the data in the Datahub front end
    m
    • 2
    • 19
  • c

    crooked-midnight-76614

    10/18/2021, 8:21 AM
    How do I set/override the name/group id of "generic-mce-consumer-job-client" and "generic-mae-consumer-job-client" when using the DataHub Helm Charts? I'm trying to enable the analytics feature but I'm getting the following error as Kafka group ids have to be prefixed in my organization:
    07:51:34.891 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Authorization Exception
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-mce-consumer-job-client
    Which I think in turn is causing:
    07:51:34.903 [generic-mce-consumer-job-client-0-C-1[] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Fatal consumer exception; stopping container
    And perhaps finally causing the following error when trying to access the analytics tab:
    Copy code
    08:17:15.790 [Thread-13[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
    08:17:15.792 [Thread-13[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
    java.lang.RuntimeException: Search query failed:
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:93)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:59)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:39)
        at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:27)
        at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
        at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
        at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
        at graphql.execution.Execution.executeOperation(Execution.java:165)
        at graphql.execution.Execution.execute(Execution.java:104)
        at graphql.GraphQL.execute(GraphQL.java:557)
        at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
        at graphql.GraphQL.executeAsync(GraphQL.java:446)
        at graphql.GraphQL.execute(GraphQL.java:377)
        at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
        at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
        ... 17 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
    {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
            at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
            ... 21 common frames omitted
    08:17:15.798 [Thread-15[] ERROR c.l.d.g.a.service.AnalyticsService - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
    08:17:15.802 [Thread-13[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    title\n    charts {\n      ... on TimeSeriesChart {\n        title\n        lines {\n          name\n          data {\n            x\n            y\n            __typename\n          }\n          __typename\n        }\n        dateRange {\n          start\n          end\n          __typename\n        }\n        interval\n        __typename\n      }\n      ... on BarChart {\n        title\n        bars {\n          name\n          segments {\n            label\n            value\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      ... on TableChart {\n        title\n        columns\n        rows {\n          values\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
    08:17:15.799 [Thread-15[] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
    java.lang.RuntimeException: Search query failed:
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:245)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getHighlights(AnalyticsService.java:216)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.getHighlights(GetHighlightsResolver.java:50)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:29)
        at com.linkedin.datahub.graphql.analytics.resolver.GetHighlightsResolver.get(GetHighlightsResolver.java:19)
        at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
        at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
        at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
        at graphql.execution.Execution.executeOperation(Execution.java:165)
        at graphql.execution.Execution.execute(Execution.java:104)
        at graphql.GraphQL.execute(GraphQL.java:557)
        at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)
        at graphql.GraphQL.executeAsync(GraphQL.java:446)
        at graphql.GraphQL.execute(GraphQL.java:377)
        at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:88)
        at com.datahub.metadata.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:82)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event[]]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
        at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:240)
        ... 17 common frames omitted
        Suppressed: org.elasticsearch.client.ResponseException: method [POST[], host [<https://search-datahub-cjfy6-y4e4cliwdivgdm3iou375f7d4a.eu-central-1.es.amazonaws.com:443>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
    {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_event[]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"},"status":404}
            at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
            at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
            ... 21 common frames omitted
    08:17:15.807 [Thread-15[] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getHighlights {\n  getHighlights {\n    value\n    title\n    body\n    __typename\n  }\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getHighlights[], extensions={code=500, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getHighlights[], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
    I've tried setting the following environment variables:
    DATAHUB_USAGE_EVENT_NAME
    DATAHUB_USAGE_EVENT_KAFKA_CONSUMER_GROUP_ID
    DATAHUB_TRACKING_TOPIC
    ⬆️ 1
    e
    • 2
    • 7
  • s

    some-hospital-42166

    10/18/2021, 8:45 AM
    083839.181 [ForkJoinPool.commonPool-worker-5] ERROR c.l.d.g.r.search.SearchResolver - Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 com.linkedin.data.template.RequiredFieldNotPresentException: Field "value" is required but it is not present 083839.181 [ForkJoinPool.commonPool-worker-5] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.RuntimeException: Failed to execute search: entity type MLMODEL_GROUP, query *, filters: [], start: 0, count: 20 at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:82) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ... 5 common frames omitted
    b
    s
    +2
    • 5
    • 21
  • b

    brief-lizard-77958

    10/18/2021, 1:29 PM
    Using freshly pulled code from github, I cannot complete a gradlew build. It fails on metadata-ingestion:lint. The error output is: https://pastebin.com/nkxwVkp1, --debug out: https://pastebin.com/gDD7w2M3 It seems to trip on this part:
    [system.out] venv/lib/python3.9/site-packages/airflow/_vendor/connexion/spec.py:169: error: invalid syntax
    See thread for the picture of where this points to. I could build previous versions without a problem with the same system (Ubuntu 21.04)
    ✅ 1
    b
    h
    • 3
    • 9
  • w

    witty-butcher-82399

    10/18/2021, 2:15 PM
    I’ve been testing snowflake connector. While it worked with basic usage, I got some issues with advanced features. Sharing here for discussion before raising any issue in github. • When enabling the
    profile
    feature, I got the following error:
    Cannot perform CREATE TEMPTABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
    Have you got this error before? Is there any way I could set which is the schema to be used for the profiler when creating the temporary tables? • When using the
    snowflake-usage
    connector, I got the error below.
    Copy code
    "Failed to parse usage line {'query_start_time': datetime.datetime(2021, 10, 17, 3, 41, 14, 560000, "
                            'tzinfo=datetime.timezone.utc), \'query_text\': "create temporary table '
                            'avalanche.dwh_stage_iad.accepted_values_stg_ad_images_source_system__IAD__tmp as\\n      select '
                            'test_run_start_ts,\\n            row_count,\\n            failure_row_count,\\n              case when failure_row_count > '
                            "0\\n                  then 'ERROR'\\n                  else 'PASS'\\n            end as test_status\\n      from "
                            '(\\n        select current_timestamp as test_run_start_ts,\\n                count_all_sql.row_count,\\n                '
                            'case when count_all_sql.row_count > 0 \\n                      then (\\n                              -- begin of data test '
                            'query\\n                              select count(1) as row_count\\n      from  avalanche.dwh_stage_iad.stg_ad_images as '
                            "model\\n      \\n        where (\\n                source_system not in ('IAD')\\n              "
                            ')\\n                              -- and of data test query\\n                          )\\n                      else '
                            '0\\n                end as failure_row_count\\n          from (\\n                  -- begin of count all '
                            'query\\n                select count(1) as row_count\\n      from  avalanche.dwh_stage_iad.stg_ad_images as model\\n      '
                            '\\n                  -- end of count all query\\n              ) as count_all_sql\\n      );", \'query_type\': '
                            "'CREATE_TABLE_AS_SELECT', 'base_objects_accessed': [{'columns': [{'columnId': 82388016, 'columnName': 'SOURCE_SYSTEM'}], "
                            "'objectDomain': 'Table', 'objectId': 19718154, 'objectName': 'AVALANCHE.DWH_STAGE_IAD.STG_AD_IMAGES'}], 'user_name': "
                            "'SERVICE_AVALANCHE', 'first_name': 'Avalanche', 'last_name': 'Service Account', 'display_name': 'SERVICE_AVALANCHE', "
                            "'email': None, 'role_name': 'SERVICE_DBT'}",
    • Also, regarding the
    snowflake-usage
    connector, it called my attention that it is handled as an independent connector instead of just a property on the
    snowflake
    connector. Because of that, while I can filter (`allow`/`deny`) tables and schemas with the
    snowflake
    connector, I can’t with the
    snowflake-usage
    one. This results in
    snowflake-usage
    producing events for tables that I don’t want to be in the catalog. Any reason why this split of the connector? Or how can I keep both connectors aligned on which tables being processed? Thanks in advance!
    h
    p
    • 3
    • 8
  • w

    witty-actor-87329

    10/18/2021, 6:28 PM
    Hi Team, Trying to configure okta authentication from react by following this guide. But getting below error:
    Copy code
    Caused by: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value
    	at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1378)
    	at com.nimbusds.openid.connect.sdk.AuthenticationRequest.parse(AuthenticationRequest.java:1312)
    	at org.pac4j.oidc.redirect.OidcRedirectActionBuilder.buildAuthenticationRequestUrl(OidcRedirectActionBuilder.java:110)
    	... 49 common frames omitted
    Below is my configurations:
    Copy code
    - AUTH_OIDC_ENABLED=true
        - AUTH_OIDC_CLIENT_ID=xxxxxxxxxxxxxxx
        - AUTH_OIDC_CLIENT_SECRET=zzzzzzzzzzzzzzzz
        - AUTH_OIDC_DISCOVERY_URI=<https://xyz.okta.com/.well-known/openid-configuration>
        - AUTH_OIDC_BASE_URL=<https://datahub-prod.xyz.io>
        - AUTH_OIDC_SCOPE="openid profile email groups"
    Can anyone help on this. cc: @gentle-father-80172
    b
    • 2
    • 2
  • r

    red-pizza-28006

    10/19/2021, 6:52 PM
    Hi everyone, when I setup datahub on a kubernetes cluster, I see that all pods are running fine but clicking on datasets in the UI, results in this error in the frontend pod. Any idea what might be causing this?
    p
    • 2
    • 13
  • h

    handsome-football-66174

    10/19/2021, 7:13 PM
    Hi everyone, Trying to set up OIDC using the below configuration
    Copy code
    extraEnvs:
      - name: AUTH_OIDC_ENABLED
        value: "true"
      - name: AUTH_OIDC_CLIENT_ID
        value: MMSOauthClient
      - name: AUTH_OIDC_CLIENT_SECRET
        value: "<value>"
      - name: AUTH_OIDC_DISCOVERY_URI
        value: https://<saml-host>/.well-known/openid-configuration
      - name: AUTH_OIDC_BASE_URL
        value: https://<host>/
    It is redirecting to this <https://<hostname>/#error_description=The+global+default+access+token+manager+is+not+available+for+the+selected+client+and+authentication+context&error=invalid_request|https://<hostname>/#error_description=The+global+default+[…]ication+context&error=invalid_request>
    s
    b
    l
    • 4
    • 56
  • m

    microscopic-elephant-47912

    10/19/2021, 8:29 PM
    Hello, I'm trying to ingest looker metadata but I'm having some problems. 1 week ago I was able to ingest with the same config file.
    Copy code
    ---- (full traceback above) ----
    File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 91, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 52, in run
        pipeline = Pipeline.create(pipeline_config)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 120, in create
        return cls(config)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 88, in __init__
        self.source: Source = source_class.create(
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 788, in create
        return cls(config, ctx)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 245, in __init__
        self.client = LookerAPI(self.source_config).get_client()
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 84, in __init__
        raise ConfigurationError(
    
    ConfigurationError: Failed to initialize Looker client. Please check your configuration.
    Slack Conversation
    m
    • 2
    • 14
  • m

    modern-nail-74015

    10/20/2021, 1:55 AM
    Can I use my own ODIC backend?
    b
    s
    • 3
    • 2
  • h

    high-notebook-40979

    10/20/2021, 4:46 AM
    hi all I'm try to access Manage user and group page throw exception
    b
    • 2
    • 3
  • m

    modern-nail-74015

    10/20/2021, 8:10 AM
    Copy code
    Caused by: java.lang.RuntimeException: Failed to resolve user name claim from profile provided by Identity Provider. Missing attribute. Attribute: 'preferred_username', Regex: '(.*)', Profile: {at_hash=froRaU5vpSNyY0aMxVEXmw, token_expiration_advance=-1, aud=[WCca1QCzMPQ6HDgOthv0UvB6WtuMUjHC], id_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ0ZWxldHJhYW4iLCJzdWIiOiIxIiwiYXVkIjoiV0NjYTFRQ3pNUFE2SERnT3RodjBVdkI2V3R1TVVqSEMiLCJleHAiOjE2MzQ3MjQyOTgsImlhdCI6MTYzNDcxNzA5OCwidXNlcm5hbWUiOiJydWljb3JlIiwiYXRfaGFzaCI6ImZyb1JhVTV2cFNOeVkwYU14VkVYbXcifQ.EQeE8S12nrwIj5FeoJDcAOXT3nzcQCrdpyrmBVXJmTs, iss=teletraan, exp=Wed Oct 20 10:04:58 GMT 2021, iat=Wed Oct 20 08:04:58 GMT 2021, username=ruicore}
    b
    • 2
    • 6
  • t

    tall-forest-65335

    10/20/2021, 10:44 AM
    I’m looking at a problem in Datahub where once I load datasets, I can find them if I search but not by browsing through the UI (the React UI if it matters). Does anyone have any suggestions on where I might have problems e.g. which component(s) are responsible for creating that directory, but not a direct lookup? This is on Datahub v0.8.14.
    g
    • 2
    • 5
  • r

    red-pizza-28006

    10/20/2021, 11:46 AM
    i have a question around using our custom Kafka for Datahub. I couldn’t find anything in the helm chart to provide a username/password for using kafka. Same thing for schema registry as I want to supply a truststore file and password
    b
    e
    • 3
    • 9
  • m

    melodic-helmet-78607

    10/21/2021, 6:08 AM
    I have a question regarding rollback behavior, does rollback not deleting existing graph/relationship? 1. I list all run id in console & rollback all runs manually 2. Ingest a dataset schema 3. Past lineage is somehow appeared
    g
    • 2
    • 1
  • m

    modern-nail-74015

    10/21/2021, 8:25 AM
    I want to revoke a token, but this error occurred
    b
    • 2
    • 9
  • n

    nice-planet-17111

    10/21/2021, 8:26 AM
    Hello 🙂 l'm tyring to use ClouSQL as datahub storage.. I was trying to connect via CloudSQL auth proxy (i edited
    datahub-gms
    and
    datahub-upgrade-job
    deployment, and when i helm update, upgrade job fails and i get
    Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'
    error 😞 Is there something i can do? (error log and helm chart is in the thread because it's too long)
    ✅ 1
    r
    • 2
    • 8
  • r

    red-pizza-28006

    10/21/2021, 6:37 PM
    anyone can share an example where they have used their own Kafka? I am seeing if I can use our own Kafka but use the k8 zookeeper and schema registry
    e
    • 2
    • 2
  • d

    dry-policeman-74195

    10/22/2021, 4:23 PM
    Working on the quickstart.... I already have docker desktop installed on my windows computer and it is running. I've run steps to install datahub in my python environment. When I try and run "datahub docker quickstart" I get the following error. Any thoughts on what is wrong here? Thanks!
    m
    b
    • 3
    • 17
  • c

    curved-jordan-15657

    10/24/2021, 8:49 PM
    Hello team, today we had an issue while ingesting redshift. The problem is, we were using the “host” field in recipe.yml file as host: <endpoint>/<db> name since today. But now, we had an error like:
    Copy code
    File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
        args.func(args, dag=self.dag)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
        return func(*args, **kwargs)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 92, in wrapper
        return f(*args, **kwargs)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
        _run_task_by_selected_method(args, dag, ti)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
        _run_raw_task(args, ti)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
        ti._run_raw_task(
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
        return func(*args, session=session, **kwargs)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1324, in _run_raw_task
        self._execute_task_with_callbacks(context)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1443, in _execute_task_with_callbacks
        result = self._execute_task(context, self.task)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1494, in _execute_task
        result = execute_callable(context=context)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 151, in execute
        return_value = self.execute_callable()
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/airflow/operators/python.py", line 162, in execute_callable
        return self.python_callable(*self.op_args, **self.op_kwargs)
      File "/home/airflow/airflow/dags/datahub_ingestion_dag.py", line 39, in datahub_redshift
        pipeline.run()
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
        for wu in self.source.get_workunits():
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 253, in get_workunits
        lineage_mcp = self.get_lineage_mcp(wu.metadata.proposedSnapshot.urn)
      File "/home/airflow/airflow_venv/lib/python3.8/site-packages/datahub/ingestion/source/sql/redshift.py", line 272, in get_lineage_mcp
        tablename = dataset_params[2]
    IndexError: list index out of range
    After digging the problem, i realized that in redshift-py file, “dataset_params” array includes 3 parts inside by
    dataset_params =dataset_key.name.split(".")
    . And dataset_key doesn’t recognize our db-name from our recipe file since we wrote `host: <endpoint>/<db-name>`If i give the database name like
    database:<db-name>
    in recipe.yml file, it resolves this problem but then i had another error like:
    Copy code
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 58, in run
      pipeline.run()
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 125, in run
      for wu in self.source.get_workunits():
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/redshift.py", line 248, in get_workunits
      for wu in super().get_workunits():
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 364, in get_workunits
      yield from self.loop_tables(inspector, schema, sql_config)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 435, in loop_tables
      columns = inspector.get_columns(table, schema)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 390, in get_columns
      col_defs = self.dialect.get_columns(
    File "<string>", line 2, in get_columns
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 52, in cache
      ret = fn(self, con, *args, **kw)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 454, in get_columns
      cols = self._get_redshift_columns(connection, table_name, schema, **kw)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy_redshift/dialect.py", line 705, in _get_redshift_columns
      return all_columns[key]
    
    KeyError: '<our-schema-name>.<our-table-name>'
    Everything was perfect until now. Datahub version is 0.8.16. I think there is with I need a solution for the first problem.
    m
    • 2
    • 5
  • f

    fresh-carpet-31048

    10/25/2021, 2:05 PM
    noob java question: is the difference between
    DataFetcher<CompletableFuture<String>>
    and
    DataFetcher<CompletableFuture<Boolean>>
    the return type (the first returns a string and second returns a boolean)?
    f
    • 2
    • 1
  • h

    handsome-football-66174

    10/25/2021, 8:50 PM
    General - Getting this error - Frontend - play.api.UnexpectedException: Unexpected exception[CompletionException: akka.http.scaladsl.model.EntityStreamException: Entity stream truncation] at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247) at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:176) at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:363) at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:361) at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346) at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:92) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92) at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) GMS - Suppressed: org.elasticsearch.common.ParsingException: Failed to parse object: expecting field with name [error] but found [Message] at org.elasticsearch.common.xcontent.XContentParserUtils.ensureFieldName(XContentParserUtils.java:50) at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:592) at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869) ... 19 common frames omitted Caused by: org.elasticsearch.client.ResponseException: method [POST], host [https://es.us-east-1.es.amazonaws.com:443], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Forbidden] {"Message":"User: anonymous is not authorized to perform: es:ESHttpPost"} at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302) at org.elasticsearch.client.RestClient.access$1700(RestClient.java:100) at org.elasticsearch.client.RestClient$1.completed(RestClient.java:350) ... 16 common frames omitted
    b
    e
    • 3
    • 6
  • f

    future-hamburger-62563

    10/26/2021, 1:19 AM
    Finding some challenges trying to run
    docker/dev.sh
    , when I try and build I get this error
    Copy code
    org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=<http://docker.elastic.co/elasticsearch/elasticsearch:7.9.3|docker.elastic.co/elasticsearch/elasticsearch:7.9.3>, imagePullPolicy=DefaultPullPolicy())
    	at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1286)
    	at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:615)
    * What went wrong: Execution failed for task 'metadata iotest'. I tried running
    docker container prune
    and the command cleared some disk space, but the build is still failing. Any ideas?
    p
    • 2
    • 9
  • a

    ancient-hair-10877

    10/26/2021, 10:59 AM
    Hi team, I have installed datahub but frontend only show dataset, Dashboard, charts and pipeline, ML models, glossary term, feature table have been hiden. Using uri /browse/pipelines, browse/mlModels,... still work
    p
    b
    • 3
    • 3
1...456...119Latest