https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • r

    red-pizza-28006

    10/26/2021, 11:51 AM
    is it possible to ingest data from postgres statistics (https://www.postgresql.org/docs/12/catalog-pg-statistic.html) instead of turning on profiling?
    m
    s
    • 3
    • 3
  • l

    loud-camera-71352

    10/26/2021, 12:21 PM
    Hi! I’m trying to use transformer so that glue ingestion appears as S3. I used
    set_dataset_browse_path
    but no dataset is visible.
    b
    m
    s
    • 4
    • 5
  • r

    red-pizza-28006

    10/26/2021, 6:48 PM
    Suddenly I started to get these exception when emitting to the GMS sink, any ideas? ``````
    g
    m
    • 3
    • 7
  • b

    bland-teacher-17190

    10/27/2021, 1:18 PM
    When building the project locally, I'm getting the a test failed error. Just wondering how I can fix this? is it a node.js issue? which version of node.js is supported to build the project?
    g
    • 2
    • 14
  • w

    witty-butcher-82399

    10/27/2021, 2:42 PM
    Hi! I have just upgraded from
    0.8.15
    to
    0.8.16
    and GMS is not able to start. Found a couple of exceptions suggesting there is an issue with ES indexes. Any idea how we can overcome this?
    s
    e
    +2
    • 5
    • 24
  • v

    victorious-dream-46349

    10/27/2021, 4:37 PM
    How to delete a lineage between datasets ? I make a post call with aspect
    Copy code
    {
              "com.linkedin.dataset.UpstreamLineage": {
                "upstreams": []
              }
    },
    ``` and it was 200 OK. And when I get using https://BASE_URL/entities/urn-of-dataset, it is returning with empty
    com.linkedin.dataset.UpstreamLineage
    aspect as expected.
    But in the UI
    , I am still seeing the lineage. Question: How to delete the lineage permanently in datahub (using rest-api preferably) ?
    e
    b
    • 3
    • 5
  • c

    curved-sandwich-81699

    10/27/2021, 8:23 PM
    There is a bug in the Snowflake ingestion with acryl-datahub 0.8.16.1; with a config like:
    Copy code
    source:
      type: "snowflake"
      config:
        username: ...
        password: ...
        host_port: ...
        warehouse: ...
        role: "accountadmin"
        database_pattern:
          ignoreCase: true
          allow:
            - "db1"
            - "db2"
        table_pattern:
          ignoreCase: true
          allow:
            - "db1.schema1.table1"
            - "db2.schema2.table2"
        include_tables: true
        include_table_lineage: false
    Only
    db1.schema1.table1
    get ingested and the log show
    db2.schema1.table1
    getting filtered out, even though there are no
    db2.schema1
    schema, and
    db2.schema2.table2
    does not show up in the log.
    m
    h
    • 3
    • 11
  • a

    adamant-van-40260

    10/28/2021, 3:55 AM
    I am getting the error withe the complex dataset
    Copy code
    10:06:22.490 [ForkJoinPool.commonPool-worker-15] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getDataset($urn: String!) {\n  dataset(urn: $urn
    ....
    result: {errors=[{message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 0, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 1, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 2, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 3, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 4, entity], extensions={code=500, type=SERVER_ERROR,
    m
    • 2
    • 4
  • h

    handsome-football-66174

    10/28/2021, 7:24 PM
    Hi Getting this error when a user ( who has been assigned necessary Metadata policies ) is trying to add Tags
    Copy code
    java.util.concurrent.CompletionException: com.linkedin.datahub.graphql.exception.AuthorizationException: Unauthorized to perform this action. Please contact your DataHub administrator.
    	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: com.linkedin.datahub.graphql.exception.AuthorizationException: Unauthorized to perform this action. Please contact your DataHub administrator.
    	at com.linkedin.datahub.graphql.types.tag.TagType.update(TagType.java:140)
    	at com.linkedin.datahub.graphql.types.tag.TagType.update(TagType.java:46)
    	at com.linkedin.datahub.graphql.resolvers.mutate.MutableTypeResolver.lambda$get$0(MutableTypeResolver.java:35)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    	... 1 common frames omitted
    19:20:52.101 [Thread-11216] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "mutation updateTag($input: TagUpdate!) {\n updateTag(input: $input) {\n  urn\n  name\n  description\n  ownership {\n   ...ownershipFields\n   __typename\n  }\n  __typename\n }\n}\n\nfragment ownershipFields on Ownership {\n owners {\n  owner {\n   ... on CorpUser {\n    urn\n    type\n    username\n    info {\n     active\n     displayName\n     title\n     email\n     firstName\n     lastName\n     fullName\n     __typename\n    }\n    editableInfo {\n     pictureLink\n     __typename\n    }\n    __typename\n   }\n   ... on CorpGroup {\n    urn\n    type\n    name\n    info {\n     email\n     admins {\n      urn\n      username\n      info {\n       active\n       displayName\n       title\n       email\n       firstName\n       lastName\n       fullName\n       __typename\n      }\n      editableInfo {\n       pictureLink\n       teams\n       skills\n       __typename\n      }\n      __typename\n     }\n     members {\n      urn\n      username\n      info {\n       active\n       displayName\n       title\n       email\n       firstName\n       lastName\n       fullName\n       __typename\n      }\n      editableInfo {\n       pictureLink\n       teams\n       skills\n       __typename\n      }\n      __typename\n     }\n     groups\n     __typename\n    }\n    __typename\n   }\n   __typename\n  }\n  type\n  __typename\n }\n lastModified {\n  time\n  __typename\n }\n __typename\n}\n", result: {errors=[{message=Unauthorized to perform this action. Please contact your DataHub administrator., locations=[{line=2, column=3}], path=[updateTag], extensions={code=403, classification=DataFetchingException}}], data={updateTag=null}}, errors: [DataHubGraphQLError{path=[updateTag], code=UNAUTHORIZED, locations=[SourceLocation{line=2, column=3}]}]
    g
    b
    • 3
    • 29
  • n

    nice-country-99675

    10/28/2021, 8:01 PM
    👋 Hello team! I'm not sure if this is the proper channel... I'm trying to test Google Authentication following this guidelines https://datahubproject.io/docs/how/auth/sso/configure-oidc-react-google/... is there a way to set those env vars when I'm running datahub from containers?
    b
    b
    • 3
    • 4
  • c

    chilly-spring-43918

    10/29/2021, 2:19 PM
    Hi Team, i am not sure this is the right channel to ask. https://datahubproject.io/docs/how/kafka-config#configuring-topic-names is there a way to configure topics name other than these topics? our organization has naming rule for the kafka topics
    Copy code
    datahub-gms
    
        METADATA_CHANGE_EVENT_NAME: The name of the metadata change event topic.
        METADATA_AUDIT_EVENT_NAME: The name of the metadata audit event topic.
        FAILED_METADATA_CHANGE_EVENT_NAME: The name of the failed metadata change event topic.
    
    datahub-mce-consumer
    
        KAFKA_MCE_TOPIC_NAME: The name of the metadata change event topic.
        KAFKA_FMCE_TOPIC_NAME: The name of the failed metadata change event topic.
    
    datahub-mae-consumer
    
        KAFKA_TOPIC_NAME: The name of the metadata audit event topic.
    it causing error on this topics name. or maybe there is a way to ignore the error?
    Copy code
    Caused by: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [MetadataChangeProposal_v1]
    o
    e
    • 3
    • 4
  • l

    little-address-54150

    10/29/2021, 2:50 PM
    Hey Team, Is the Glue ingestion not supporting Column descriptions from the tables in Glue metadata? Its written in the documentation but I cannot seem to ingest this information v0.8.16
    m
    • 2
    • 3
  • b

    brief-wolf-70822

    10/29/2021, 7:59 PM
    Hey team, I have an ingestion recipe where I specify a
    table_pattern.allow
    . I notice that when I enable profiling, it seems to ignore this and profile all of the tables. Do I need to also set
    profile_pattern.allow
    to the same list? I had assumed it would only profile tables I had allowed with the table allow pattern but maybe I'm wrong there
    m
    h
    b
    • 4
    • 5
  • f

    future-hamburger-62563

    10/30/2021, 3:27 AM
    Hi all, (No rush or urgency here! Just a curious developer looking around) I had some time to move over my stuff to my faster setup and I retried
    ./gradlew build
    and I had significantly more success. So the steps I followed were: 1. Installed docker and forked/git cloned the repo. 2. Ran
    ./gradlew build
    , got an error for: No Java. Installed Java 8 bc I prev. had an issue with Java 11. Added my JAVA_PATH to .bashrc and retried. 3. Got an error for no
    venv
    - so I `sudo apt install python3.8-venv`and ran again. 4. Got an error for no
    jq
    - so I
    sudo apt install jq
    and ran again. 5. Got the
    :metadata-io:test
    failed and re-found this thread I found with my last attempts. I used
    ./gradlew build -x check
    and I'm happy to say that the build was successful. So, the questions I have now are: 1. Did I miss a step with Docker? Do I need to run the quickstart so that the
    :metadata-io:test
    will work correctly? 2. I'm interested in baby-stepping into making code changes, but I'm new and inexperienced. Would my next steps be to make the changes, re-build and then run
    docker/dev.sh
    to launch a container and inspect the changes? 3. Is it necessary to always rebuild after changes are made and redeploy to docker? Or if I wanted to focus on the React part could I just run it locally? If so, how might I do this? 4. Finally would it be helpful at all for me to add slightly more detailed instructions to the developer's setup guide? Thanks all. Have a nice weekend, try and catch that fall weather! 🍂🍁
    • 1
    • 1
  • b

    bland-orange-13353

    11/01/2021, 6:48 AM
    This message was deleted.
    l
    • 2
    • 1
  • r

    red-pizza-28006

    11/01/2021, 9:44 AM
    Hello, when I am trying to ingest Azure AD users, I am getting this exception
    Copy code
    [2021-11-01 10:36:26,936] ERROR    {datahub.ingestion.run.pipeline:69} - failed to write record with workunit urn:li:corpGroup:Catch%20up with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Conversion = 'u'\n\tat com.linkedin.metadata.restli.RestliUtil.badRequestException(RestliUtil.java:84)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:35)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n\tat com.linkedin.metadata.resources.entity.EntityResource.ingest(EntityResource.java:182)\n\tat sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:172)\n\tat
    Any ideas? Looking at the exception it seems that we are not able to process spaces in the name? Also, it looks like the recipe only ingested groups and not users with this error
    Copy code
    ValueError: Unable to find the key mail in Group. Is it wrong?
    b
    • 2
    • 9
  • h

    handsome-belgium-11927

    11/01/2021, 11:56 AM
    Something changed with platform ingestion during last few days? Was ingesting on friday, everything was ok, now I'm getting urn instead of names:
    g
    b
    • 3
    • 14
  • h

    hallowed-article-64840

    11/02/2021, 8:49 AM
    Hi guys I have deployed Datahub using helm chart and ingest some data source in it. but there is one issue related to users/groups firstly when I setup by default config (only datahub user in user.props file) , the users list is empty ( datahub user should be in list ) even when I change the user.props file and add some other users in it, I can login by new users credentials, but still the users list is empty DataHub Version : "v0.8.16"
    b
    • 2
    • 6
  • r

    ripe-sunset-20897

    11/02/2021, 9:22 AM
    Hi guys! I've deployed my datahub instances through docker container within GCP-Compute Engine. the front-end works well, but when I sent a post (with proper payload) to the GraphQL endpoint in
    <http://localhost:8080/api/graphql>
    , it gives me
    Copy code
    "exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/api'\n\tat
    can anyone helps me with this error ?, thanks
    e
    m
    • 3
    • 14
  • r

    red-pizza-28006

    11/02/2021, 12:34 PM
    when ingesting Snowflake data, I see this error reported in the terminal
    Copy code
    [2021-11-02 13:27:38,054] WARNING  {datahub.ingestion.source.sql.snowflake:183} - Extracting lineage from Snowflake failed.Please check your premissions. Continuing...
    Error was (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 13 at position 33
    invalid identifier 'T.OBJECTS_MODIFIED'
    [SQL:
    WITH table_lineage_history AS (
        SELECT
            r.value:"objectName" AS upstream_table_name,
            r.value:"objectDomain" AS upstream_table_domain,
            r.value:"columns" AS upstream_table_columns,
            w.value:"objectName" AS downstream_table_name,
            w.value:"objectDomain" AS downstream_table_domain,
            w.value:"columns" AS downstream_table_columns,
            t.query_start_time AS query_start_time
        FROM
            (SELECT * from snowflake.account_usage.access_history) t,
            lateral flatten(input => t.BASE_OBJECTS_ACCESSED) r,
            lateral flatten(input => t.OBJECTS_MODIFIED) w
        WHERE r.value:"objectId" IS NOT NULL
        AND w.value:"objectId" IS NOT NULL
        AND w.value:"objectName" NOT LIKE '%.GE_TMP_%'
        AND t.query_start_time >= to_timestamp_ltz(1635724800000, 3)
        AND t.query_start_time < to_timestamp_ltz(1635811200000, 3))
    SELECT upstream_table_name, downstream_table_name, upstream_table_columns, downstream_table_columns
    FROM table_lineage_history
    WHERE upstream_table_domain = 'Table' and downstream_table_domain = 'Table'
    QUALIFY ROW_NUMBER() OVER (PARTITION BY downstream_table_name, upstream_table_name ORDER BY query_start_time DESC) = 1        ]
    (Background on this error at: <http://sqlalche.me/e/13/f405>).
    Based on snowflake’s docs, I dont see OBJECTS_MODIFIED field in the snowflake.account_usage.access_history. Can this be a bug?
    l
    b
    • 3
    • 10
  • r

    red-pizza-28006

    11/02/2021, 3:05 PM
    Sorry for dropping in so many questions. 😄 I tried some of the configs available in Snowflake ingestion as well. For e:g
    profiling.limit
    -> I started getting this error
    Copy code
    [SQL: CREATE OR REPLACE TEMPORARY TABLE ge_tmp_eee423e9 AS SELECT *
    FROM src_salesforce."ORDER"
     LIMIT 2000
    Without this, it seems to work fine. I also tried adding
    Copy code
    profile.turn_off_expensive_profiling_metrics
    But for some reason it does not accept the profile param. I am on the latest datahub version 0.8.16.2
    h
    • 2
    • 6
  • h

    handsome-football-66174

    11/02/2021, 6:46 PM
    General - Getting this error . I was redeploying the pods on a new EKS cluster- Caused by:  org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanAspectDao' defined in com.linkedin.gms.factory.entity.EbeanAspectDaoFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.entity.ebean.EbeanAspectDao]: Factory method 'createInstance' threw exception; nested exception is java.lang.NullPointerException
    e
    • 2
    • 7
  • b

    bland-orange-13353

    11/02/2021, 11:45 PM
    This message was deleted.
    l
    h
    • 3
    • 2
  • i

    icy-scooter-74959

    11/03/2021, 1:04 AM
    Sorry, at a bit of a loss. I can ingest fine to a remote DataHub server with
    datahub ingest
    . However, when I try to do it in Airflow using the example I am getting
    Copy code
    "com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/datasets'
    Can someone point me in the right direction? Is this a mismatched version problem?
    e
    • 2
    • 9
  • a

    adamant-van-40260

    11/03/2021, 5:07 AM
    Hi team, after upgrade to 0.8.16 I got the problem
    Copy code
    POST /entities?action=searchAcrossEntities - searchAcrossEntities - 200 - 0ms
    04:56:50.672 [ForkJoinPool.commonPool-worker-3] ERROR c.l.datahub.graphql.GmsGraphQLEngine - Failed to load Entities of type: DataJob, keys: [urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),MERGE_CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),MERGE_USERPAYMENT_SUBTRANSTYPE_CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_userpayment_subtranstype_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_userpayment_subtranstype_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,growth_daily_crm_5h_10h_12h_15h_shared,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),CRM_RAW_DATA_CORE_TRANS)] Failed to batch load DataJobs
    04:56:50.673 [ForkJoinPool.commonPool-worker-3] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
    java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
    	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
    	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
    Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
    	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$104(GmsGraphQLEngine.java:862)
    	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    	... 5 common frames omitted
    Caused by: java.lang.RuntimeException: Failed to batch load DataJobs
    	at com.linkedin.datahub.graphql.types.datajob.DataJobType.batchLoad(DataJobType.java:106)
    	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$104(GmsGraphQLEngine.java:859)
    	... 6 common frames omitted
    Caused by: java.lang.IllegalStateException: Duplicate key com.linkedin.metadata.entity.ebean.EbeanAspectV2@5a1a7708
    g
    • 2
    • 8
  • l

    lively-jackal-83760

    11/03/2021, 9:44 AM
    Hi Team I tries to set business glossary term to some dataset and some column on this dataset. Then on glossary's Related entities page I see dataset, but not see column. Is it expected and glossary relates only to datasets?
    g
    • 2
    • 1
  • l

    lively-jackal-83760

    11/03/2021, 1:26 PM
    Also some kind of bug I set some Glossary term to several datasets, but on the glossary terms page I see only 10 of them and no any pagination buttons
    g
    • 2
    • 1
  • f

    faint-hair-91313

    11/03/2021, 4:22 PM
    Hey guys, love the new version. I like the fact the platforms also show up on the main screen, but it doesn't seem to catch my custom ones. E.g. Along Tableau Charts, I also have some Zeppelin based-charts. And would like to see Zeppelin popping-up as a platform, too. The same under datasets. Custom platforms don't show up.
    e
    l
    b
    • 4
    • 11
  • h

    high-hospital-85984

    11/03/2021, 5:27 PM
    Questioning my sanity here, but hopefully someone can help clear things up. I have a new datahub deployment (version 0.8.11 ish), and did a test ingesting a dummy user. It seems to work, as I see
    c.l.m.r.entity.EntityResource - INGEST urn urn:li:corpuser:test-user with system metadata {lastObserved=1635959902784}
    in the GMS log, and it shows up in search and in the UI, but I don't see it in the GMS database (
    select * from metadata_aspect_v2
    ). The DB only contains the default
    urn:li:corpuser:datahub
    . Am I missing something? 😅
    g
    • 2
    • 19
  • p

    plain-farmer-27314

    11/04/2021, 7:30 PM
    Hi all, wondering if there is a way to fetch all upstreams of a given entity (not just the first "level") using either the REST or graphql API
    g
    b
    • 3
    • 10
1...567...119Latest