https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • f

    famous-quill-82626

    02/13/2023, 1:15 AM
    What would cause the UI to return no data on Users & Groups page? (i.e. not even the current root User itself - "datahub") I see that the page uses a Graphql query to find Users under the covers, but returns no data:
    Copy code
    query listUsers($input: ListUsersInput!) {
      listUsers(input: $input) {
        start
        count
        total
        users {
          urn
          username
          isNativeUser
          info {
            active
            displayName
            title
            firstName
            lastName
            fullName
            email
            __typename
          }
          editableProperties {
            displayName
            pictureLink
            teams
            title
            skills
            __typename
          }
          status
          roles: relationships(
            input: {types: ["IsMemberOfRole"], direction: OUTGOING, start: 0}
          ) {
            start
            count
            total
            relationships {
              entity {
                ... on DataHubRole {
                  urn
                  type
                  name
                  relationships(input: {types: ["IsMemberOfRole"], direction: INCOMING}) {
                    start
                    count
                    total
                    __typename
                  }
                  __typename
                }
                __typename
              }
              __typename
            }
            __typename
          }
          __typename
        }
        __typename
      }
    }
    ---> result:
    Copy code
    {
      "data": {
        "listUsers": {
          "start": 0,
          "count": 25,
          "total": 0,
          "users": [],
          "__typename": "ListUsersResult"
        }
      },
      "extensions": {}
    }
    Is this query attempting to get the User list from the DataHub database? ..or if not, what is it querying? Could database access be affecting the returned result? Thanks, Pete
    g
    b
    h
    • 4
    • 6
  • b

    best-umbrella-88325

    02/13/2023, 9:55 AM
    Hello community! We are trying to make a change in the existing flow of the way glossary terms are attached to datasets, and for that reason, have been exploring the datahub source code. What we observe is that when the dataset is loaded on the UI, a GraphQL query is hit from the browser, however, no logs are visible in GMS for that request. We need that log to identify which part of the code is responsible for showing the attached glossary terms on the dataset page. Can someone help us as to which request is responsible for showing the glossary terms on the dataset or which class is responsible for doing it. Thanks in advance!
    g
    a
    b
    • 4
    • 6
  • p

    purple-printer-15193

    02/13/2023, 11:43 AM
    Hello! I’ve done a
    datahub delete --hard
    for the dbt platform, however the homepage is displaying an incorrect dbt stats. Am I missing anything else? What else do I need to delete?
    a
    g
    b
    • 4
    • 8
  • g

    gentle-portugal-21014

    02/13/2023, 12:09 PM
    Hi *, We've got an issue with broken searching and I'm out of ideas. 😞 This post is a follow-up to https://datahubspace.slack.com/archives/C02R2NBJXD1/p1674234240711219. The symptoms are that searching for strings contained in glossary term names results in correct displaying of specific individual glossary terms in the pull down list below the search field, but selecting "View all results for ..." (the searched string) or pressing Enter in the search field gives 'No results found for "..."' page and a HTTP error 400 in the GMS container log:
    Copy code
    at com.linkedin.metadata.search.client.CachingEntitySearchService.search(CachingEntitySearchService.java:54)
    	at com.linkedin.metadata.search.aggregator.AllEntitiesSearchAggregator.lambda$getSearchResultsForEachEntity$2(AllEntitiesSearchAggregator.java:161)
    	at com.linkedin.metadata.utils.ConcurrencyUtils.lambda$transformAndCollectAsync$0(ConcurrencyUtils.java:24)
    	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
    	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
    	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch:9200>], URI [/glossarytermindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
    error={"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"glossarytermindex_v2","node":"hYLiXc5_SCa0D2KGNC172Q","reason":{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2","caused_by":{"type":"illegal_argument_exception","reason":"Can't parse boolean value [architektura], expected [true] or [false]"}}}]} status=400
    (there are more lines in the stack trace, but I believe that the most important lines are included this way). The string "architektura" appearing in the exception record at the bottom was my searched term. The important point is that we extended the Glossary Term entity by adding additional aspects with new attributes (as discussed elsewhere) including getting those new attributes supported in searching and filtering (I can share the PDL files for the added aspects here if necessary, of course). This happens on a completely new DataHub deployment of a forked repository (our last resync/merge includes commit a164bdab) with our modifications, i.e. both MySQL and Elastic Search databases were created from scratch (we used the option without Neo4J) and Glossary Terms were created anew after this deployment, i.e. there were no changes in the metamodel since the Elastic Search index was created. I tried restoring indices using the POST method provided within the GMS API for all URNs found in the MySQL database (as discussed elsewhere, I couldn't use the datahub-upgrade image due to our metamodel extensions), but that didn't help. Any help would be very appreciated...
    a
    b
    b
    • 4
    • 16
  • g

    green-hamburger-3800

    02/13/2023, 3:53 PM
    Hey folks, from what I tested, it's not possible to override Trino platform even though technically they have the platform config exposed on the configuration here https://datahubproject.io/docs/generated/ingestion/sources/trino/, right? From what I investigated, it seems to be overwritten by the way the platform get's passed upstream here https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/trino.py#L171 I was thinking about having a
    Starburst
    platform just to have the data a bit more user friendly here and wondering how we could do that... We tested overwriting the code to pass
    starburst
    as the platform parameter from the code default itself after creating the starburst platform and it worked, but not sure how we could collaborate with the datahub codebase to make it available... I'd guess the easier way it's having a
    StarburstSource
    that would extend the
    TrinoSource
    and do nothing but change that parameter... but I'm not sure it's a good idea! Thoughts?! c.c @best-notebook-58252
    plus1 2
    d
    b
    • 3
    • 6
  • c

    creamy-machine-95935

    02/13/2023, 5:01 PM
    We have deployed our self-managed instance of Datahub in Google Kubernetes Engine GKE. Which would be the recommended way to host the MySQL database? In the pod as StatefulSet or as a managed service (e.g CloudSQL) Thanks ! πŸ˜¬πŸ‘€
    plus1 1
    πŸ‘€ 1
    βœ… 2
    a
    i
    b
    • 4
    • 5
  • c

    colossal-easter-99672

    02/13/2023, 5:24 PM
    Hello, team. Is there any way to get change datetime in versionedDataset in graphql? For example, query
    Copy code
    query get_ddl_history {
      versionedDataset(
        urn: "urn:li:dataset:(urn:li:dataPlatform:clickhouse,analytics.profit.debit,PROD)"
        versionStamp: "viewProperties:0"
      ) {
        viewProperties {
          logic
        }
      }
    }
    a
    b
    • 3
    • 3
  • w

    white-sandwich-70716

    02/13/2023, 5:25 PM
    We have datahub installed in google kubernetes engine GKE, could you please help me if it would be a good practice to have two DATAHUB environment (Dev, Production). if so how can I migrate an environments? πŸ€” πŸ‘€
    plus1 1
    πŸ‘€ 1
    a
    b
    • 3
    • 4
  • s

    salmon-jordan-53958

    02/13/2023, 5:56 PM
    Hi, I am trying to upgrade from version 8.6 to 10.0.0 - using docker containers, postgres db and Elastics search both hosted with AWS. I managed to run datahub docker quickstart -f docker-compose-file.yml and get it to work but none of my data is showing. I have also moved to a new Elastic search domain. I can see the data in the database but it is not showing in datahub. Any thoughts?
    πŸ‘€ 2
    a
    b
    • 3
    • 4
  • f

    flat-match-62670

    02/13/2023, 8:44 PM
    Hi, I am running into some issues trying to delete all glossary terms via the CLI. When trying to run this command:
    datahub delete --entity_type glossaryTerm --query "*" --soft -f
    I am receiving this traceback in return:
    Copy code
    [2023-02-13 12:37:43,117] INFO     {datahub.cli.delete_cli:326} - Filter matched  glossaryTerm entities of None. Sample: []
    No urns to delete. Maybe you want to change entity_type=glossaryTerm or platform=None to be something different?
    Took 1.477 seconds to hard delete 0 versioned rows and 0 timeseries aspect rows for 0 entities.
    It appears that the datahub-gms pod is unable to locate any of the glossary urns to delete. I am able to successfully ingest via the CLI so I know my environment variables are correct. Not quite sure what is happening but I would love to be able to mass delete all the glossary terms/nodes. Any help appreciated! Thanks
    πŸ‘€ 1
    i
    b
    a
    • 4
    • 12
  • e

    enough-monitor-24292

    02/14/2023, 7:22 AM
    Hi All, I'm using datahub 0.8.38 version, is there any rest api which will help to delete any object using urn? Thanks
    βœ… 1
    d
    a
    • 3
    • 2
  • l

    lively-spring-5482

    02/14/2023, 1:31 PM
    Hi, We’re experiencing weird behaviour on DataHub v. 0.10.0 with SSO functionality enabled. It manifests with
    502 Bad Gateway
    error message. Steps to reproduce (updated Chrome browser, MacOS/Win): 1. Login using SSO to DataHub - success 2. Play around for a while [optional] 3. Sign out from the UI 4. Once back on the login screen - re-login using SSO. Step #4 gets us the above mentioned 502 error. To us, the probable scenario looks like an internal error in DataHub. We’ve noticed
    PLAY_SESSION
    cookie holding the user currently logged in is not removed after the sign out. The browser still sends this cookie even though the session has already expired on the server. The latter refrains from responding to the request sent with an inactive cookie leading the load balancer to respond with an 502 error message. Any suggestions on how we could fix the problem? Thanks in advance!
    a
    c
    +4
    • 7
    • 16
  • r

    ripe-tailor-61058

    02/14/2023, 4:00 PM
    Hello, Is it possible to search via the UI search bar for any dataset containing a schema field called aircraft? I ingested a simple csv file using the datahub CLI and it found the schema fields but search is not finding it. Also would be interested in python API for searching schema fields as well.
    βœ… 1
    a
    • 2
    • 6
  • r

    ripe-tailor-61058

    02/14/2023, 4:36 PM
    image.png
    i
    • 2
    • 1
  • r

    ripe-tailor-61058

    02/14/2023, 4:38 PM
    I tried fieldPaths: entry_into_service in the main search bar where it says 'Search Datasets, People, & more...' but get no results. I also tried it on the demo datahub (fieldPaths: latitude) but it doesn't show any results their either. I am using 0.10.0.
    b
    b
    • 3
    • 11
  • w

    witty-actor-87329

    02/14/2023, 8:33 PM
    Hello, I’m trying to do docker-compose on datahub-gms with the latest head version with the below configs. But getting below error, error message from docker logs:
    Copy code
    2023-02-14 20:25:50,405 [main] WARN  c.l.metadata.entity.EntityService:798 - Unable to produce legacy MAE, entity may not have legacy Snapshot schema.
    java.lang.UnsupportedOperationException: Failed to find Typeref schema associated with Config-based Entity
    Configs used for gms:
    Copy code
    datahub-gms:
                  container_name: datahub-gms
                  environment:
                        - DATAHUB_UPGRADE_HISTORY_KAFKA_CONSUMER_GROUP_ID=generic-duhe-consumer-job-client-gms
                        - EBEAN_DATASOURCE_USERNAME=xyz
                        - EBEAN_DATASOURCE_PASSWORD=xyz
                        - EBEAN_DATASOURCE_HOST=xyz
                        - EBEAN_DATASOURCE_URL=jdbc:<postgresql://xyz>
                        - EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
                        - KAFKA_BOOTSTRAP_SERVER=broker:29092
                        - KAFKA_SCHEMAREGISTRY_URL=<http://schema-registry:8081>
                        - ELASTICSEARCH_HOST=elasticsearch
                        - ELASTICSEARCH_PORT=9200
                        - ES_BULK_REFRESH_POLICY=WAIT_UNTIL
                        - ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEX=true
                        - ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX=true
                        - NEO4J_HOST=<http://neo4j:7474>
                        - NEO4J_URI=<bolt://neo4j>
                        - NEO4J_USERNAME=neo4j
                        - NEO4J_PASSWORD=datahub
                        - JAVA_OPTS=-Xms1g -Xmx1g
                        - GRAPH_SERVICE_DIFF_MODE_ENABLED=true
                        - GRAPH_SERVICE_IMPL=neo4j
                        - ENTITY_REGISTRY_CONFIG_PATH=/datahub/datahub-gms/resources/entity-registry.yml
                        - ENTITY_SERVICE_ENABLE_RETENTION=true
                        - MAE_CONSUMER_ENABLED=true
                        - MCE_CONSUMER_ENABLED=true
                        - PE_CONSUMER_ENABLED=true
                        - UI_INGESTION_ENABLED=true
                        - METADATA_SERVICE_AUTH_ENABLED=false
                  hostname: datahub-gms
                  image: ${DATAHUB_GMS_IMAGE:-linkedin/datahub-gms}:${DATAHUB_VERSION:-head}
                  ports:
                  - ${DATAHUB_MAPPED_GMS_PORT:-8080}:8080
    Can anyone help me on this? Thanks
    a
    o
    f
    • 4
    • 14
  • s

    salmon-jordan-53958

    02/14/2023, 9:05 PM
    Hi can anyone help me with the error below, this is from the datahub-update docker container: ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2
    βœ… 1
    πŸ‘€ 1
    a
    i
    • 3
    • 2
  • r

    rich-policeman-92383

    02/15/2023, 4:39 AM
    Datahub Version: v0.9.6 Removing a user does not remove the ownership aspect. Scenario: User A owns 10 datasets. If we remove user A, user A is still the owner of 10 datasets.
    a
    • 2
    • 4
  • g

    gray-cpu-75769

    02/15/2023, 4:49 AM
    Hi Team, I’m trying the ingest metadata for MLFeatureProperties, it seems like the source urn supports only dataset as an entity type and not any other entity such as mlFeature, is it possible to add the sources of MLFeatureProperties other than of type dataset.
    t
    h
    +3
    • 6
    • 12
  • b

    busy-analyst-35820

    02/15/2023, 6:17 AM
    Hi Team, We are facing an exception while performing customProperties search in Datahub UI, for few specific properties. It gives 500, unknown error in PROD env. Expected to get list of entities for the above search , like we receive for all other property search, where as getting exception for few properties. search command we tried :- customProperties:<filedname> / customProperties:<fieldname>=<value>* Issue details 1. customProperties search works in STG and DEV env for all properties 2. customProperties search throws exception in PROD env only for few specific properties, and works for rest of the properties First screenshot shows the exception received in PROD and the second screenshot shows the same command that worked in STG env. Could you please guide us to resolve this. cc: @melodic-match-38516
    βœ… 1
    a
    t
    +4
    • 7
    • 29
  • a

    alert-fall-82501

    02/15/2023, 6:35 AM
    Hi Team - Can anyone suggest on this front end error ? Please check screenshot in the thread
    b
    a
    • 3
    • 4
  • g

    glamorous-elephant-17130

    02/15/2023, 12:34 PM
    Copy code
    <https://aws.amazon.com/blogs/big-data/part-1-deploy-datahub-using-aws-managed-services-and-ingest-metadata-from-aws-glue-and-amazon-redshift/>
    Hey guys, I followed this document to setup datahub in my dev environment. Any clue on how to change the default id password for the root user?
    πŸ‘€ 1
    βœ… 1
    i
    b
    • 3
    • 7
  • h

    hundreds-notebook-26128

    02/15/2023, 8:53 PM
    Tried installing docker quickstart on my M1 Mac, using (as directed):
    Copy code
    datahub docker quickstart --arch m1
    The response I get is the following, even though I do have β€˜docker’ running…. in my case, I am using a dockerd/moby setup via RancherDesktop… is this maybe why DataHub is not detecting that my β€˜Docker’ is actually running and operating normally, i.e. because it is RancherDesktop ?
    Copy code
    Using architecture Architectures.m1
    Docker doesn't seem to be running. Did you start it?
    I just searched this channel and found others with similar problems, and a link to: https://github.com/rancher-sandbox/rancher-desktop/issues/2534 Some of that thread SEEMS to apply to my situation, but I am not sure I can switch over to trying to use containerd/nerdctl vs. dockerd/moby as my Container Runtime, because that would likely affect every other container I traditionally run
    βœ… 1
    i
    f
    • 3
    • 5
  • p

    powerful-cat-68806

    02/16/2023, 9:00 AM
    Hi all, I’m getting a
    404
    error from nginx when trying to access from public I can provide yaml file for the relevant pod - frontend/gms
    πŸ‘€ 1
    βœ… 1
    a
    • 2
    • 2
  • f

    fierce-garage-74290

    02/16/2023, 10:40 AM
    What is the recommended way of creating domains and managing them with Git? I would like to have an automated DataHub setup (infra + ingestion, CI/CD, Git, etc.), but I couldn't find an easy way to create domains via sources (like for business glossaries). So, if I would like to automate it with CI/CD somehow (I'd prefer not to create domains via UI) do I need to wrap such GraphQL mutations with some script or there is some easier workaround?
    Copy code
    mutation createDomain {
      createDomain(input: { id: "urn:mynewdomain", name: "My New Domain", description: "An optional description" })
    }
  • a

    average-dinner-25106

    02/16/2023, 11:18 AM
    Hi, I want to run the command "datahub docker quickstart" in the environment where internet is not connected due to security. The result is : HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url : /dataub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml To solve this, proxy server is essential. Thus, I want to add the "http_proxy" environment variable. But where to add? I think datahub command has no proxy server config option. Then, how can I do?
    πŸ‘€ 1
    βœ… 1
    s
    i
    b
    • 4
    • 7
  • a

    ancient-guitar-60671

    02/16/2023, 4:47 PM
    Hi all, had a question on integrating Great Expectations with Datahub Getting this error: 'Unable to emit metadata to DataHub GMS' Our configuration: Great expectations running locally using pip install: great-expectations==0.15.41 acryl-datahub==0.10.0 Datahub installed on an EC2 using Docker Compose: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a930973244ae confluentinc/cp-schema-registry:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp schema-registry a088522061b3 acryldata/datahub-actions:head "/bin/sh -c 'dockeri…" 2 weeks ago Up 2 weeks datahub-datahub-actions-1 cb3ce66b1623 linkedin/datahub-frontend-react:head "/bin/sh -c ./start.…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:9002->9002/tcp, :::9002->9002/tcp datahub-frontend-react 05a5fdc42a7d confluentinc/cp-kafka:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp broker 67fff60507c3 linkedin/datahub-gms:head "/bin/sh -c /datahub…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp datahub-gms b98e8d9a1bf9 confluentinc/cp-zookeeper:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp zookeeper 1f1853725db1 mysql:5.7 "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql 7088fb455740 elasticsearch:7.10.1 "/tini -- /usr/local…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp elasticsearch Port 8080 on the EC2 is open inbound for our IP range Full error message: (.venv) PS C:\data\great_expectations\gx_tutorials> great_expectations checkpoint run test_datasource_checkpoint Using v3 (Batch Request) API Calculating Metrics: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [0001&lt;0000, 1.28it/s] Datasource test_datasource is not present in platform_instance_map ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:254)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:228)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRe46)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)\n\tat com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:98)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: com.linkedin.restli.server.RoutingException\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:111)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:181)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:224)\n\t... 69 more\n', 'status': 404}) Validation succeeded! Suite Name Status Expectations met - test_datasource βœ”οΈ Passed 1 of 1 (100.0 %)
    βœ… 1
    a
    • 2
    • 4
  • h

    helpful-greece-26038

    02/16/2023, 7:42 PM
    Unable to start Datahub 0.10 - I have a relatively small Datahub instance that has been running fine for the last 6 months using Docker QuickStart. I attempted to upgrade to version 0.10 today and ran into problems and was hoping someone might have suggestions. The steps that I did are: 1. I followed the release note advice to run the command "docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate. This failed with "ERROR SpringApplication Application run failed. org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'; 2. I tried "datahub docker nuke" and then "datahub docker quickstart". Eventually I get the message "Unable to run quickstart - the following issues were detected: datahub-gms is running but not health" 3. When I inspect the logs from the datahub-gms server, I see many errors like "Error creating bean with name 'siblingGraphServiceFactory'" I've tried the cycle of stopping the Docker components and restarting them, but nothing seems to work. Never mind - I did a few more cycles of nuking the components and trying to run quickstart and eventually it came up correctly
    thanks bear 1
    βœ… 1
    a
    f
    • 3
    • 2
  • p

    powerful-telephone-2424

    02/16/2023, 10:13 PM
    I’m trying to run
    docker/dev.sh
    after pulling the latest datahub code and I’m seeing this message repeatedly:
    Copy code
    datahub-actions_1         | 2023/02/16 22:10:34 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:35 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:36 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:37 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:38 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:39 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:40 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    datahub-actions_1         | 2023/02/16 22:10:41 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
    Digging in further I found that the broker isn’t running and when I try to manually start its docker container, I see this error:
    Copy code
    2023-02-16 14:04:18 [2023-02-16 22:04:18,885] INFO Session establishment complete on server zookeeper/172.18.0.3:2181, session id = 0x10000152e130003, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
    2023-02-16 14:04:18 [2023-02-16 22:04:18,888] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
    2023-02-16 14:04:18 [2023-02-16 22:04:18,942] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Feature ZK node at path: /feature does not exist (kafka.server.FinalizedFeatureChangeListener)
    2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Cleared cache (kafka.server.FinalizedFeatureCache)
    2023-02-16 14:04:19 [2023-02-16 22:04:19,068] INFO Cluster ID = 9_PboVE2QOad45hS_5Tn9w (kafka.server.KafkaServer)
    2023-02-16 14:04:19 [2023-02-16 22:04:19,074] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
    2023-02-16 14:04:19 kafka.common.InconsistentClusterIdException: The Cluster ID 9_PboVE2QOad45hS_5Tn9w doesn't match stored clusterId Some(XkdbYCWoRVadmGA-i2RwKw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
    2023-02-16 14:04:19     at kafka.server.KafkaServer.startup(KafkaServer.scala:230)
    2023-02-16 14:04:19     at kafka.Kafka$.main(Kafka.scala:109)
    2023-02-16 14:04:19     at kafka.Kafka.main(Kafka.scala)
    2023-02-16 14:04:19 [2023-02-16 22:04:19,075] INFO shutting down (kafka.server.KafkaServer)
    2023-02-16 14:04:19 [2023-02-16 22:04:19,076] INFO [feature-zk-node-event-process-thread]: Shutting down (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    2023-02-16 14:04:19 [2023-02-16 22:04:19,077] INFO [feature-zk-node-event-process-thread]: Stopped (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    Any pointers on how to resolve this?
    βœ… 1
    πŸ‘€ 1
    i
    • 2
    • 1
  • r

    rich-pager-68736

    02/17/2023, 6:52 AM
    Hello DataHub-Team! Since our latest ingestion run, a duplicate key error occurs in GMS when trying to browse our Snowflake datasets. This renders DataHub unusable, because Snowflake is our primary data store. This is somehow related to Snowflake + dbt, but I could not narrow it down besides stating the obvious, that the Java code is trying to merge an equal object. Any help would be appreciated, because this is blocks our prod env. The following error occurs: In the Frontend UI:
    An unknown error occurred. (code 500)
    Failed to load results! An unexpected error occurred.
    And in the GMS logs, it looks like this:
    Copy code
    06:42:03.642 [Thread-484] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
    java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
        at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
        at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$3(DataLoaderHelper.java:272)
        at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
        at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
        at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
        at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1588)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
        ... 1 common frames omitted
    Caused by: java.lang.RuntimeException: Failed to batch load Datasets
        at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:146)
        at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1585)
        ... 2 common frames omitted
    Caused by: java.lang.IllegalStateException: Duplicate key EntityAspectIdentifier(urn=urn:li:dataset:(urn:li:dataPlatform:dbt,XXXXXXXXXXXXX.YYYYYYYYYYYYYY.ZZZZZZZZZZZZZ,PROD), aspect=upstreamLineage, version=0) (attempted merging values com.linkedin.metadata.entity.EntityAspect@f613625b and com.linkedin.metadata.entity.EntityAspect@f613625b)
        at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
        at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
        at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
        at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:263)
        at com.linkedin.metadata.entity.EntityService.getEnvelopedAspects(EntityService.java:1826)
        at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:379)
        at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:333)
        at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:289)
        at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:109)
        at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:130)
        ... 3 common frames omitted
    06:42:03.645 [Thread-421] ERROR c.datahub.graphql.GraphQLController:99 - Errors while executing graphQL query: "query getSearchResultsForMultiple($input: SearchAcrossEntitiesInput!) {\n  searchAcrossEntitie
    ...
    b
    l
    • 3
    • 15
1...777879...119Latest