https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • c

    calm-dinner-63735

    05/28/2022, 8:34 PM
    Is there any way i can ingest lineage information from a glue job
    l
    c
    • 3
    • 52
  • b

    bumpy-activity-74405

    05/30/2022, 8:19 AM
    hello, I am trying out bigquery ingestion - it seems to work for datasets but not for tables. It fails with a permissions error
    Copy code
    the user does not have 'bigquery.readsessions.create' permission for ...
    But this permission is not in the permission list in the docs. Is this just a case of documentation being out of date or am I doing something wrong?
    d
    • 2
    • 1
  • g

    great-cpu-72376

    05/30/2022, 9:23 AM
    Hi, I have two installation of datahub 0.8.35, from the first one I can copy the urn of the entity I search from UI, from the second one I cannot copy the urn from the UI. I always use the same browser that is Chrome 102.0.5005.62. I see this error on my browser:
    Copy code
    react-dom.production.min.js:101 Uncaught TypeError: Cannot read properties of undefined (reading 'writeText')
        at onClick (EntityHeader.tsx:234:57)
        at J (button.js:233:57)
        at Object.qe (react-dom.production.min.js:52:317)
        at Ye (react-dom.production.min.js:52:471)
        at react-dom.production.min.js:53:35
        at Tr (react-dom.production.min.js:100:68)
        at xr (react-dom.production.min.js:101:380)
        at react-dom.production.min.js:113:65
        at je (react-dom.production.min.js:292:189)
        at react-dom.production.min.js:50:57
    What should I do to solve?
    d
    s
    • 3
    • 6
  • w

    wonderful-quill-11255

    05/30/2022, 11:20 AM
    Hello. We are trying to upgrade from 0.8.21 to 0.8.26 (yes we are behind) and it seemed to work without issues in non-production. When we upgraded production and browsed to the start page, we saw a brief error message saying
    The field at path '/listRecommendations/modules[2]/content[4]/entity/tool' was declared as a non null type, but the code involved in retrieving the data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'String' within parent type 'Dashboard' (code undefined)
    The error message was quickly hidden again but the recommendations at the start page were all gone. A seemingly correlated exception stacktrace in the gms logs says:
    Copy code
    java.lang.NullPointerException: null
    	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$69(GmsGraphQLEngine.java:831)
    	at com.linkedin.datahub.graphql.resolvers.load.LoadableTypeResolver.get(LoadableTypeResolver.java:34)
    	at com.linkedin.datahub.graphql.resolvers.load.LoadableTypeResolver.get(LoadableTypeResolver.java:22)
    	at com.linkedin.datahub.graphql.resolvers.AuthenticatedResolver.get(AuthenticatedResolver.java:25)
    	at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)
    	at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)
    	at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)
    	at graphql.execution.ExecutionStrategy.completeValueForObject(ExecutionStrategy.java:646)
    	at graphql.execution.ExecutionStrategy.completeValue(ExecutionStrategy.java:438)
    	at graphql.execution.ExecutionStrategy.completeField(ExecutionStrategy.java:390)
    	at graphql.execution.ExecutionStrategy.lambda$resolveFieldWithInfo$1(ExecutionStrategy.java:205)
    	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
    	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
    	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
    	at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$2(DataLoaderHelper.java:230)
    	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
    	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
    	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
    	at java.lang.Thread.run(Thread.java:748)
    The code around that line looks like (Still on the 0.8.26 tag) private void configureDashboardResolvers(final RuntimeWiring.Builder builder) { builder.type("Dashboard", typeWiring -> typeWiring .dataFetcher("relationships", new AuthenticatedResolver<>( new EntityRelationshipsResultResolver(graphClient) )) .dataFetcher("platform", new AuthenticatedResolver<>( new LoadableTypeResolver<>(dataPlatformType, (env) -> ((Dashboard) env.getSource()).getPlatform().getUrn())) <--- Line 831 ) I noticed in the release notes for 0.8.25 that one can migrate entities from platform to platform instances but I interpret that as an optional feature that is not a required part of the upgrade (we also didn't see this behaviour in the nonproduction upgrade). We rolled back the upgrade for now and are trying to reproduce the behaviour locally but without success yet. Perhaps someone here has some advice?
    l
    • 2
    • 2
  • g

    great-cpu-72376

    05/31/2022, 10:40 AM
    Hi, I am trying to use the DatahubEmitterOperator, in Airflow to ingest lineage information I defined my operator in this way:
    Copy code
    emitter_task = DatahubEmitterOperator(
            task_id="emitter_task",
            datahub_conn_id="datahub_rest_default",
            mces=[
                builder.make_lineage_mce(
                    upstream_urns=[builder.make_dataset_urn(platform="file", name=filename) for filename in "{{ ti.xcom_pull(task_ids='test_operator_task', key='file_list') }}"],
                    downstream_urn=builder.make_dataset_urn(platform="file", name="/nfs/data/archival/archive_test.tar.gz")
                )
            ]
        )
    the problem is that the template
    {{ ti.xcom_pull(task_ids='test_operator_task', key='file_list') }}
    is not evaluated, how can I get the xcom pushed by another task?
    e
    d
    e
    • 4
    • 3
  • m

    most-solstice-19338

    05/31/2022, 11:55 AM
    Hi, I am trying to recover from a backup database (following this: https://datahubproject.io/docs/how/backup-datahub/). When I try to run the following
    ./docker/datahub-upgrade/datahub-upgrade.sh -u RestoreIndices
    I get the following error: (see in thread).
    l
    m
    • 3
    • 5
  • c

    chilly-elephant-51826

    05/31/2022, 5:27 PM
    @here GMS service is not able to connect with elasticsearch instance if
    xpack.security.enabled=true
    , it throws error as below it seems password is not been able to propagate properly
    Copy code
    Caused by: 
    
    org.elasticsearch.client.ResponseException: method [HEAD], host [<http://elasticsearch:9200>], URI [/graph_service_v1?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 401 Unauthorized]
    I have raised a bug, link, help is really appreciated
    e
    l
    • 3
    • 8
  • n

    numerous-eve-42142

    05/31/2022, 9:06 PM
    Hi! Someone knows how to limit the number of simultaneous queries running on DB source when profiling tables? I'm using simple bash opperator command on redshift, and it is running with more than 30 queries at the same time.
    d
    • 2
    • 3
  • w

    worried-painting-70907

    05/31/2022, 10:28 PM
    So, I was testing with the helm installation for datahub and I got a weird error? not sure what it could be as I didn't pass it a values.yml and just used the default. It also looks like my deployment only has some of the components as well - not all
    e
    • 2
    • 7
  • b

    better-spoon-77762

    05/31/2022, 11:31 PM
    Hello All I have one local (docker-compose) deployment of datahub which uses a regular ElasticSearch container and the datahub usage events work and I see data stream are setup i also have a AWS deployed datahub which makes use of AWS open search (ES equivalent), in this environment i am unable to see any charts related to datahub usage getting populated. is it because AWS ES doesn't support data streams?
    e
    • 2
    • 7
  • w

    wonderful-quill-11255

    06/01/2022, 6:30 AM
    Hi. A question about deleting entries from the catalog. Background: We are still on a version that doesn't have stateful ingestion (i.e. doesn't delete entries). As a cleanup mechanism, we wrote a script that can "delete" entries from the catalog by deleting the entries both from the metadata_aspect_v2 table and the datasetindex_v2/dashboardindex_v2 elasticsearch indices. But this seems pretty fragile. I've just discovered for example that we also need to clean the entries from the graph_service_v1 index as well to not have lineage to orphan entities (which when we upgrade from 0.8.21 -> 0.8.26 throws errors in the ui). My question. Is there support in a later version to delete all entries from a certain platform?
    e
    l
    • 3
    • 4
  • b

    bumpy-activity-74405

    06/01/2022, 6:35 AM
    Hi I am having difficulties profiling bigquery tables. So I’ve got lots of columns that seem to have unsupported data types (see thread for errors from an ingestion run). Are those just not supported or am I doing something wrong? And if they aren’t supported is there an easy way to exclude such columns? This prevents stats from all the other columns being ingested as far as I can tell.
    d
    • 2
    • 8
  • s

    square-solstice-69079

    06/01/2022, 10:17 AM
    Trying to set up airflow in AWS (using MWAA) to push metadata about DAG's. Step 3 in https://datahubproject.io/docs/lineage/airflow/. Where do we put this airflow connections add ?
    d
    • 2
    • 1
  • w

    worried-painting-70907

    06/01/2022, 4:01 PM
    I've been working with setting up elasticsearch in datahub, and I see the runs succeeding, however, I dont see things flowing into the main dashboard?
    e
    h
    • 3
    • 17
  • w

    wonderful-smartphone-35332

    06/01/2022, 5:49 PM
    Hello! I tried adding new ingestion source today (as well as some secrets) - getting a successful response in the front end - but then not seeing those changes reflect in the current list.
    e
    b
    • 3
    • 12
  • h

    handsome-football-66174

    06/01/2022, 8:22 PM
    Hi Everyone - Getting this when upgrading to 0.8.32 (everything else works fine)
    Copy code
    20:19:44.193 [Thread-2896] ERROR c.datahub.graphql.GraphQLController:93 - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    groupId\n    title\n    charts {\n      ...analyticsChart\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment analyticsChart on AnalyticsChart {\n  ... on TimeSeriesChart {\n    title\n    lines {\n      name\n      data {\n        x\n        y\n        __typename\n      }\n      __typename\n    }\n    dateRange {\n      start\n      end\n      __typename\n    }\n    interval\n    __typename\n  }\n  ... on BarChart {\n    title\n    bars {\n      name\n      segments {\n        label\n        value\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  ... on TableChart {\n    title\n    columns\n    rows {\n      values\n      cells {\n        value\n        linkParams {\n          searchParams {\n            types\n            query\n            filters {\n              field\n              value\n              __typename\n            }\n            __typename\n          }\n          entityProfileParams {\n            urn\n            type\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  __typename\n}\n", result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[getAnalyticsCharts], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data=null}, errors: [DataHubGraphQLError{path=[getAnalyticsCharts], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]
    e
    • 2
    • 31
  • m

    modern-belgium-81337

    06/01/2022, 9:06 PM
    Hi team. What is the quickest way to tell that
    datahub init
    worked? I forwarded my GMS pod to a local host address, and skipped the token part, and that was it, there was no confirmation. I was wondering if there’s a way to tell it’s a successful connection without trying a command like
    ingest
    ?
    e
    • 2
    • 12
  • s

    salmon-rose-54694

    06/02/2022, 7:08 AM
    I download latest code (commit 640affd3cef1fbdf19ad1e7fd34685, PR) and compile gms but find below error, any thoughts on this error?
    e
    • 2
    • 3
  • s

    salmon-rose-54694

    06/02/2022, 10:15 AM
    I upgrade code to latest and find glossary terms are missing, i reimport again with this but still nothing there.
    b
    b
    • 3
    • 4
  • m

    miniature-journalist-76345

    06/02/2022, 3:17 PM
    Hi, team! Anybody faced issues with duplicate values of tags and owners in the UI? Screenshot is in the thread.
    b
    • 2
    • 15
  • g

    gentle-diamond-98883

    06/02/2022, 7:21 PM
    Hi team. I'm utilizing the docker quickstart image to run datahub v0.8.33 and i'd like to configure and test the impact analysis feature for data lineage. i noticed that I'm supposed to set a
    supportsImpactAnalysis
    flag mentioned here but I was wondering where/how do i set this flag to True? Thanks!
    e
    • 2
    • 9
  • c

    chilly-elephant-51826

    06/03/2022, 12:54 PM
    #troubleshoot I was trying to ingest profiling data from athena, into datahub but was not able to select stats tab in the dataset page, I tried to see if the demo project contains stats, but didn't find anything there too, can any share how to enable profiling, queries and sample data. plus it is possible to send across custom data that can be seen some where ?
    h
    b
    • 3
    • 8
  • c

    calm-dinner-63735

    06/03/2022, 12:58 PM
    Screen Shot 2022-06-03 at 2.58.25 PM.png
    b
    • 2
    • 1
  • m

    modern-laptop-12942

    06/03/2022, 4:14 PM
    Hi team! I have two issue. 1. The column stats is missing, even though I enabled the profiling. (snowflake) 2. For lineage ingestion, I have over 600K rows in SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY, each time the query will read access to all tables and it takes a long time. Is there any solution to speed up this process?
    d
    l
    • 3
    • 5
  • c

    clean-lamp-36043

    06/03/2022, 4:25 PM
    Hello Team, Quick question. I have deployed the Datahub in K8s. I am trying to ingest metadata from DBT but it does not show as a source on the UI. Is it expected ?
    b
    • 2
    • 4
  • r

    ripe-alarm-85320

    06/03/2022, 6:06 PM
    Any idea why the update indices job would fail like this?
    e
    • 2
    • 4
  • s

    some-microphone-33485

    06/03/2022, 7:53 PM
    Hello Team , We have below scenario on datahub metadata delete , We can delete any metadata in datahub with API endpoint but it is not asking for any credencials or policy . • Is it true that anyone can delete metadata ? • How can we enforce that only selected users can delete metadata .
    r
    • 2
    • 2
  • b

    boundless-student-48844

    06/04/2022, 6:29 AM
    Hi team, we are running a standalone MAE Consumer. However, we realise that to produce entity change events to platform event topic, we have to set
    MCL_CONSUMER_ENABLED
    to
    true
    in GMS instead of setting it in MAE Consumer (which alr has
    MAE_CONSUMER_ENABLED="true"
    ). And the entity change events are produced by GMS instead of MAE Consumer. This doesn’t look aligned with the code I understand. From the code,
    MetadataChangeLogProcessor
    is enabled when either MAE_CONSUMER_ENABLED or MCL_CONSUMER_ENABLED is true - thus, this ChangeLog processor is alr enabled in MAE Consumer. As
    EntityChangeEventGeneratorHook
    is one of the hooks invoked by
    MetadataChangeLogProcessor
    . My understanding is that the entity change events should be produced by the standalone MAE Consumer (instead of GMS) without additional config of
    MCL_CONSUMER_ENABLED
    on GMS. Do I miss some gap here? 🙇 (running with DataHub v0.8.35, deployed with Helm chart 0.2.72)
    f
    • 2
    • 2
  • c

    chilly-elephant-51826

    06/04/2022, 12:00 PM
    #troubleshoot I am facing connectivity issue while using separate instance of Kafka with datahub, it seems that during initial deployment
    datahub-actions
    run some ingestion over kafka stream, but since I have protected kafka, that does not use simple ssl connection instead uses
    sasl
    , it is not passing the correct parameter required to connect. This is the kafka config that I am using
    Copy code
    security.protocol: SASL_SSL
        sasl.mechanism: SCRAM-SHA-512
        client.sasl.mechanism: SCRAM-SHA-512
        kafkastore.security.protocol: SSL
        ssl.endpoint.identification.algorithm: https
        ssl.keystore.type: JKS
        ssl.protocol: TLSv1.2
        ssl.truststore.type: JKS
    even though they are passed correctly to the container
    env
    variables but are not populated in the config file that gets executed here is the config that was generated (found from container logs)
    Copy code
    {'source': 
        {   'type': 'datahub-stream', 
            'config': {
                'auto_offset_reset': 'latest', 
                'connection': {
                    'bootstrap': 'XXXXXXXXXXXXXX', 
                    'schema_registry_url': 'XXXXXXXXXXXXX', 
                    'consumer_config': {'security.protocol': 'SASL_SSL'}
                    }, 
                'actions': [
                    {   'type': 'executor', 
                        'config': {
                            'local_executor_enabled': True, 
                            'remote_executor_enabled': 'False', 
                            'remote_executor_type': 'acryl.executor.sqs.producer.sqs_producer.SqsRemoteExecutor', 
                            'remote_executor_config': {
                                'id': 'remote', 
                                'aws_access_key_id': '""', 
                                'aws_secret_access_key': '""', 
                                'aws_session_token': '""', 
                                'aws_command_queue_url': '""', 
                                'aws_region': '""'
                            }
                        }
                    }
                ], 
                'topic_routes': {
                    'mae': 'MetadataAuditEvent_v4', 
                    'mcl': 'MetadataChangeLog_Versioned_v1'
                }
            }
        }, 
        'sink': {'type': 'console'}, 
        'datahub_api': {'server': '<http://datahub-datahub-gms:8080>', 'extra_headers': {'Authorization': 'Basic __datahub_system:NOTPASSING'}}
    }
    as in the above config it is clear that other required configuration are not getting passed, I have raised a bug, any help is appreciated.
    l
    b
    • 3
    • 4
  • s

    swift-breakfast-25077

    06/05/2022, 5:12 PM
    HELP ! Hi all, Trying to use Great expectation for Data Validation. The checkpoint runs, but the Validations are not getting displayed in Datahub. Added this in checkpoint configuration :
    Copy code
    - name: datahub_action
        action:
          module_name: datahub.integrations.great_expectations.action
          class_name: DataHubValidationAction
          server_url: <http://localhost:8080> #datahub server url
    Getting this message when checkpoint runs :
    l
    h
    r
    • 4
    • 32
1...313233...119Latest