DataHub #troubleshoot

famous-quill-82626

02/13/2023, 1:15 AM

What would cause the UI to return no data on Users & Groups page? (i.e. not even the current root User itself - "datahub") I see that the page uses a Graphql query to find Users under the covers, but returns no data:

Copy code

query listUsers($input: ListUsersInput!) {
  listUsers(input: $input) {
    start
    count
    total
    users {
      urn
      username
      isNativeUser
      info {
        active
        displayName
        title
        firstName
        lastName
        fullName
        email
        __typename
      }
      editableProperties {
        displayName
        pictureLink
        teams
        title
        skills
        __typename
      }
      status
      roles: relationships(
        input: {types: ["IsMemberOfRole"], direction: OUTGOING, start: 0}
      ) {
        start
        count
        total
        relationships {
          entity {
            ... on DataHubRole {
              urn
              type
              name
              relationships(input: {types: ["IsMemberOfRole"], direction: INCOMING}) {
                start
                count
                total
                __typename
              }
              __typename
            }
            __typename
          }
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}

---> result:

Copy code

{
  "data": {
    "listUsers": {
      "start": 0,
      "count": 25,
      "total": 0,
      "users": [],
      "__typename": "ListUsersResult"
    }
  },
  "extensions": {}
}

Is this query attempting to get the User list from the DataHub database? ..or if not, what is it querying? Could database access be affecting the returned result? Thanks, Pete

best-umbrella-88325

02/13/2023, 9:55 AM

Hello community! We are trying to make a change in the existing flow of the way glossary terms are attached to datasets, and for that reason, have been exploring the datahub source code. What we observe is that when the dataset is loaded on the UI, a GraphQL query is hit from the browser, however, no logs are visible in GMS for that request. We need that log to identify which part of the code is responsible for showing the attached glossary terms on the dataset page. Can someone help us as to which request is responsible for showing the glossary terms on the dataset or which class is responsible for doing it. Thanks in advance!

purple-printer-15193

02/13/2023, 11:43 AM

Hello! I’ve done a

datahub delete --hard

for the dbt platform, however the homepage is displaying an incorrect dbt stats. Am I missing anything else? What else do I need to delete?

gentle-portugal-21014

02/13/2023, 12:09 PM

Hi *, We've got an issue with broken searching and I'm out of ideas. 😞 This post is a follow-up to https://datahubspace.slack.com/archives/C02R2NBJXD1/p1674234240711219. The symptoms are that searching for strings contained in glossary term names results in correct displaying of specific individual glossary terms in the pull down list below the search field, but selecting "View all results for ..." (the searched string) or pressing Enter in the search field gives 'No results found for "..."' page and a HTTP error 400 in the GMS container log:

Copy code

at com.linkedin.metadata.search.client.CachingEntitySearchService.search(CachingEntitySearchService.java:54)
	at com.linkedin.metadata.search.aggregator.AllEntitiesSearchAggregator.lambda$getSearchResultsForEachEntity$2(AllEntitiesSearchAggregator.java:161)
	at com.linkedin.metadata.utils.ConcurrencyUtils.lambda$transformAndCollectAsync$0(ConcurrencyUtils.java:24)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch:9200>], URI [/glossarytermindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
error={"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"glossarytermindex_v2","node":"hYLiXc5_SCa0D2KGNC172Q","reason":{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2","caused_by":{"type":"illegal_argument_exception","reason":"Can't parse boolean value [architektura], expected [true] or [false]"}}}]} status=400

(there are more lines in the stack trace, but I believe that the most important lines are included this way). The string "architektura" appearing in the exception record at the bottom was my searched term. The important point is that we extended the Glossary Term entity by adding additional aspects with new attributes (as discussed elsewhere) including getting those new attributes supported in searching and filtering (I can share the PDL files for the added aspects here if necessary, of course). This happens on a completely new DataHub deployment of a forked repository (our last resync/merge includes commit a164bdab) with our modifications, i.e. both MySQL and Elastic Search databases were created from scratch (we used the option without Neo4J) and Glossary Terms were created anew after this deployment, i.e. there were no changes in the metamodel since the Elastic Search index was created. I tried restoring indices using the POST method provided within the GMS API for all URNs found in the MySQL database (as discussed elsewhere, I couldn't use the datahub-upgrade image due to our metamodel extensions), but that didn't help. Any help would be very appreciated...

green-hamburger-3800

02/13/2023, 3:53 PM

Hey folks, from what I tested, it's not possible to override Trino platform even though technically they have the platform config exposed on the configuration here https://datahubproject.io/docs/generated/ingestion/sources/trino/, right? From what I investigated, it seems to be overwritten by the way the platform get's passed upstream here https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/trino.py#L171 I was thinking about having a

Starburst

platform just to have the data a bit more user friendly here and wondering how we could do that... We tested overwriting the code to pass

starburst

as the platform parameter from the code default itself after creating the starburst platform and it worked, but not sure how we could collaborate with the datahub codebase to make it available... I'd guess the easier way it's having a

StarburstSource

that would extend the

TrinoSource

and do nothing but change that parameter... but I'm not sure it's a good idea! Thoughts?! c.c @best-notebook-58252

plus1 2

creamy-machine-95935

02/13/2023, 5:01 PM

We have deployed our self-managed instance of Datahub in Google Kubernetes Engine GKE. Which would be the recommended way to host the MySQL database? In the pod as StatefulSet or as a managed service (e.g CloudSQL) Thanks ! 😬👀

plus1 1

👀 1

✅ 2

colossal-easter-99672

02/13/2023, 5:24 PM

Hello, team. Is there any way to get change datetime in versionedDataset in graphql? For example, query

Copy code

query get_ddl_history {
  versionedDataset(
    urn: "urn:li:dataset:(urn:li:dataPlatform:clickhouse,analytics.profit.debit,PROD)"
    versionStamp: "viewProperties:0"
  ) {
    viewProperties {
      logic
    }
  }
}

white-sandwich-70716

02/13/2023, 5:25 PM

We have datahub installed in google kubernetes engine GKE, could you please help me if it would be a good practice to have two DATAHUB environment (Dev, Production). if so how can I migrate an environments? 🤔 👀

plus1 1

👀 1

salmon-jordan-53958

02/13/2023, 5:56 PM

Hi, I am trying to upgrade from version 8.6 to 10.0.0 - using docker containers, postgres db and Elastics search both hosted with AWS. I managed to run datahub docker quickstart -f docker-compose-file.yml and get it to work but none of my data is showing. I have also moved to a new Elastic search domain. I can see the data in the database but it is not showing in datahub. Any thoughts?

👀 2

flat-match-62670

02/13/2023, 8:44 PM

Hi, I am running into some issues trying to delete all glossary terms via the CLI. When trying to run this command:

datahub delete --entity_type glossaryTerm --query "*" --soft -f

I am receiving this traceback in return:

Copy code

[2023-02-13 12:37:43,117] INFO     {datahub.cli.delete_cli:326} - Filter matched  glossaryTerm entities of None. Sample: []
No urns to delete. Maybe you want to change entity_type=glossaryTerm or platform=None to be something different?
Took 1.477 seconds to hard delete 0 versioned rows and 0 timeseries aspect rows for 0 entities.

It appears that the datahub-gms pod is unable to locate any of the glossary urns to delete. I am able to successfully ingest via the CLI so I know my environment variables are correct. Not quite sure what is happening but I would love to be able to mass delete all the glossary terms/nodes. Any help appreciated! Thanks

👀 1

enough-monitor-24292

02/14/2023, 7:22 AM

Hi All, I'm using datahub 0.8.38 version, is there any rest api which will help to delete any object using urn? Thanks

✅ 1

lively-spring-5482

02/14/2023, 1:31 PM

Hi, We’re experiencing weird behaviour on DataHub v. 0.10.0 with SSO functionality enabled. It manifests with

502 Bad Gateway

error message. Steps to reproduce (updated Chrome browser, MacOS/Win): 1. Login using SSO to DataHub - success 2. Play around for a while [optional] 3. Sign out from the UI 4. Once back on the login screen - re-login using SSO. Step #4 gets us the above mentioned 502 error. To us, the probable scenario looks like an internal error in DataHub. We’ve noticed

PLAY_SESSION

cookie holding the user currently logged in is not removed after the sign out. The browser still sends this cookie even though the session has already expired on the server. The latter refrains from responding to the request sent with an inactive cookie leading the load balancer to respond with an 502 error message. Any suggestions on how we could fix the problem? Thanks in advance!

ripe-tailor-61058

02/14/2023, 4:00 PM

Hello, Is it possible to search via the UI search bar for any dataset containing a schema field called aircraft? I ingested a simple csv file using the datahub CLI and it found the schema fields but search is not finding it. Also would be interested in python API for searching schema fields as well.

✅ 1

ripe-tailor-61058

02/14/2023, 4:36 PM

image.png

ripe-tailor-61058

02/14/2023, 4:38 PM

I tried fieldPaths: entry_into_service in the main search bar where it says 'Search Datasets, People, & more...' but get no results. I also tried it on the demo datahub (fieldPaths: latitude) but it doesn't show any results their either. I am using 0.10.0.

witty-actor-87329

02/14/2023, 8:33 PM

Hello, I’m trying to do docker-compose on datahub-gms with the latest head version with the below configs. But getting below error, error message from docker logs:

Copy code

2023-02-14 20:25:50,405 [main] WARN  c.l.metadata.entity.EntityService:798 - Unable to produce legacy MAE, entity may not have legacy Snapshot schema.
java.lang.UnsupportedOperationException: Failed to find Typeref schema associated with Config-based Entity

Configs used for gms:

Copy code

datahub-gms:
              container_name: datahub-gms
              environment:
                    - DATAHUB_UPGRADE_HISTORY_KAFKA_CONSUMER_GROUP_ID=generic-duhe-consumer-job-client-gms
                    - EBEAN_DATASOURCE_USERNAME=xyz
                    - EBEAN_DATASOURCE_PASSWORD=xyz
                    - EBEAN_DATASOURCE_HOST=xyz
                    - EBEAN_DATASOURCE_URL=jdbc:<postgresql://xyz>
                    - EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
                    - KAFKA_BOOTSTRAP_SERVER=broker:29092
                    - KAFKA_SCHEMAREGISTRY_URL=<http://schema-registry:8081>
                    - ELASTICSEARCH_HOST=elasticsearch
                    - ELASTICSEARCH_PORT=9200
                    - ES_BULK_REFRESH_POLICY=WAIT_UNTIL
                    - ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEX=true
                    - ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX=true
                    - NEO4J_HOST=<http://neo4j:7474>
                    - NEO4J_URI=<bolt://neo4j>
                    - NEO4J_USERNAME=neo4j
                    - NEO4J_PASSWORD=datahub
                    - JAVA_OPTS=-Xms1g -Xmx1g
                    - GRAPH_SERVICE_DIFF_MODE_ENABLED=true
                    - GRAPH_SERVICE_IMPL=neo4j
                    - ENTITY_REGISTRY_CONFIG_PATH=/datahub/datahub-gms/resources/entity-registry.yml
                    - ENTITY_SERVICE_ENABLE_RETENTION=true
                    - MAE_CONSUMER_ENABLED=true
                    - MCE_CONSUMER_ENABLED=true
                    - PE_CONSUMER_ENABLED=true
                    - UI_INGESTION_ENABLED=true
                    - METADATA_SERVICE_AUTH_ENABLED=false
              hostname: datahub-gms
              image: ${DATAHUB_GMS_IMAGE:-linkedin/datahub-gms}:${DATAHUB_VERSION:-head}
              ports:
              - ${DATAHUB_MAPPED_GMS_PORT:-8080}:8080

Can anyone help me on this? Thanks

salmon-jordan-53958

02/14/2023, 9:05 PM

Hi can anyone help me with the error below, this is from the datahub-update docker container: ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2

✅ 1

👀 1

rich-policeman-92383

02/15/2023, 4:39 AM

Datahub Version: v0.9.6 Removing a user does not remove the ownership aspect. Scenario: User A owns 10 datasets. If we remove user A, user A is still the owner of 10 datasets.

gray-cpu-75769

02/15/2023, 4:49 AM

Hi Team, I’m trying the ingest metadata for MLFeatureProperties, it seems like the source urn supports only dataset as an entity type and not any other entity such as mlFeature, is it possible to add the sources of MLFeatureProperties other than of type dataset.

busy-analyst-35820

02/15/2023, 6:17 AM

Hi Team, We are facing an exception while performing customProperties search in Datahub UI, for few specific properties. It gives 500, unknown error in PROD env. Expected to get list of entities for the above search , like we receive for all other property search, where as getting exception for few properties. search command we tried :- customProperties:<filedname> / customProperties:<fieldname>=<value>* Issue details 1. customProperties search works in STG and DEV env for all properties 2. customProperties search throws exception in PROD env only for few specific properties, and works for rest of the properties First screenshot shows the exception received in PROD and the second screenshot shows the same command that worked in STG env. Could you please guide us to resolve this. cc: @melodic-match-38516

✅ 1

alert-fall-82501

02/15/2023, 6:35 AM

Hi Team - Can anyone suggest on this front end error ? Please check screenshot in the thread

glamorous-elephant-17130

02/15/2023, 12:34 PM

Copy code

<https://aws.amazon.com/blogs/big-data/part-1-deploy-datahub-using-aws-managed-services-and-ingest-metadata-from-aws-glue-and-amazon-redshift/>

Hey guys, I followed this document to setup datahub in my dev environment. Any clue on how to change the default id password for the root user?

👀 1

✅ 1

hundreds-notebook-26128

02/15/2023, 8:53 PM

Tried installing docker quickstart on my M1 Mac, using (as directed):

Copy code

datahub docker quickstart --arch m1

The response I get is the following, even though I do have ‘docker’ running…. in my case, I am using a dockerd/moby setup via RancherDesktop… is this maybe why DataHub is not detecting that my ‘Docker’ is actually running and operating normally, i.e. because it is RancherDesktop ?

Copy code

Using architecture Architectures.m1
Docker doesn't seem to be running. Did you start it?

I just searched this channel and found others with similar problems, and a link to: https://github.com/rancher-sandbox/rancher-desktop/issues/2534 Some of that thread SEEMS to apply to my situation, but I am not sure I can switch over to trying to use containerd/nerdctl vs. dockerd/moby as my Container Runtime, because that would likely affect every other container I traditionally run

✅ 1

powerful-cat-68806

02/16/2023, 9:00 AM

Hi all, I’m getting a

error from nginx when trying to access from public I can provide yaml file for the relevant pod - frontend/gms

👀 1

✅ 1

fierce-garage-74290

02/16/2023, 10:40 AM

What is the recommended way of creating domains and managing them with Git? I would like to have an automated DataHub setup (infra + ingestion, CI/CD, Git, etc.), but I couldn't find an easy way to create domains via sources (like for business glossaries). So, if I would like to automate it with CI/CD somehow (I'd prefer not to create domains via UI) do I need to wrap such GraphQL mutations with some script or there is some easier workaround?

Copy code

mutation createDomain {
  createDomain(input: { id: "urn:mynewdomain", name: "My New Domain", description: "An optional description" })
}

average-dinner-25106

02/16/2023, 11:18 AM

Hi, I want to run the command "datahub docker quickstart" in the environment where internet is not connected due to security. The result is : HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url : /dataub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml To solve this, proxy server is essential. Thus, I want to add the "http_proxy" environment variable. But where to add? I think datahub command has no proxy server config option. Then, how can I do?

👀 1

✅ 1

ancient-guitar-60671

02/16/2023, 4:47 PM

Hi all, had a question on integrating Great Expectations with Datahub Getting this error: 'Unable to emit metadata to DataHub GMS' Our configuration: Great expectations running locally using pip install: great-expectations==0.15.41 acryl-datahub==0.10.0 Datahub installed on an EC2 using Docker Compose: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a930973244ae confluentinc/cp-schema-registry:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp schema-registry a088522061b3 acryldata/datahub-actions:head "/bin/sh -c 'dockeri…" 2 weeks ago Up 2 weeks datahub-datahub-actions-1 cb3ce66b1623 linkedin/datahub-frontend-react:head "/bin/sh -c ./start.…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:9002->9002/tcp, :::9002->9002/tcp datahub-frontend-react 05a5fdc42a7d confluentinc/cp-kafka:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp broker 67fff60507c3 linkedin/datahub-gms:head "/bin/sh -c /datahub…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp datahub-gms b98e8d9a1bf9 confluentinc/cp-zookeeper:7.2.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp zookeeper 1f1853725db1 mysql:5.7 "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql 7088fb455740 elasticsearch:7.10.1 "/tini -- /usr/local…" 2 weeks ago Up 2 weeks (healthy) 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp elasticsearch Port 8080 on the EC2 is open inbound for our IP range Full error message: (.venv) PS C:\data\great_expectations\gx_tutorials> great_expectations checkpoint run test_datasource_checkpoint Using v3 (Batch Request) API Calculating Metrics: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [0001<0000, 1.28it/s] Datasource test_datasource is not present in platform_instance_map ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:254)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:228)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRe46)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)\n\tat com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:98)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: com.linkedin.restli.server.RoutingException\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:111)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:181)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:224)\n\t... 69 more\n', 'status': 404}) Validation succeeded! Suite Name Status Expectations met - test_datasource ✔️ Passed 1 of 1 (100.0 %)

✅ 1

helpful-greece-26038

02/16/2023, 7:42 PM

Unable to start Datahub 0.10 - I have a relatively small Datahub instance that has been running fine for the last 6 months using Docker QuickStart. I attempted to upgrade to version 0.10 today and ran into problems and was hoping someone might have suggestions. The steps that I did are: 1. I followed the release note advice to run the command "docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate. This failed with "ERROR SpringApplication Application run failed. org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'; 2. I tried "datahub docker nuke" and then "datahub docker quickstart". Eventually I get the message "Unable to run quickstart - the following issues were detected: datahub-gms is running but not health" 3. When I inspect the logs from the datahub-gms server, I see many errors like "Error creating bean with name 'siblingGraphServiceFactory'" I've tried the cycle of stopping the Docker components and restarting them, but nothing seems to work. Never mind - I did a few more cycles of nuking the components and trying to run quickstart and eventually it came up correctly

thanks bear 1

✅ 1

powerful-telephone-2424

02/16/2023, 10:13 PM

I’m trying to run

docker/dev.sh

after pulling the latest datahub code and I’m seeing this message repeatedly:

Copy code

datahub-actions_1         | 2023/02/16 22:10:34 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:35 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:36 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:37 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:38 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:39 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:40 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1         | 2023/02/16 22:10:41 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s

Digging in further I found that the broker isn’t running and when I try to manually start its docker container, I see this error:

Copy code

2023-02-16 14:04:18 [2023-02-16 22:04:18,885] INFO Session establishment complete on server zookeeper/172.18.0.3:2181, session id = 0x10000152e130003, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
2023-02-16 14:04:18 [2023-02-16 22:04:18,888] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
2023-02-16 14:04:18 [2023-02-16 22:04:18,942] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Feature ZK node at path: /feature does not exist (kafka.server.FinalizedFeatureChangeListener)
2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Cleared cache (kafka.server.FinalizedFeatureCache)
2023-02-16 14:04:19 [2023-02-16 22:04:19,068] INFO Cluster ID = 9_PboVE2QOad45hS_5Tn9w (kafka.server.KafkaServer)
2023-02-16 14:04:19 [2023-02-16 22:04:19,074] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
2023-02-16 14:04:19 kafka.common.InconsistentClusterIdException: The Cluster ID 9_PboVE2QOad45hS_5Tn9w doesn't match stored clusterId Some(XkdbYCWoRVadmGA-i2RwKw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
2023-02-16 14:04:19     at kafka.server.KafkaServer.startup(KafkaServer.scala:230)
2023-02-16 14:04:19     at kafka.Kafka$.main(Kafka.scala:109)
2023-02-16 14:04:19     at kafka.Kafka.main(Kafka.scala)
2023-02-16 14:04:19 [2023-02-16 22:04:19,075] INFO shutting down (kafka.server.KafkaServer)
2023-02-16 14:04:19 [2023-02-16 22:04:19,076] INFO [feature-zk-node-event-process-thread]: Shutting down (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
2023-02-16 14:04:19 [2023-02-16 22:04:19,077] INFO [feature-zk-node-event-process-thread]: Stopped (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)

Any pointers on how to resolve this?

✅ 1

👀 1

rich-pager-68736

02/17/2023, 6:52 AM

Hello DataHub-Team! Since our latest ingestion run, a duplicate key error occurs in GMS when trying to browse our Snowflake datasets. This renders DataHub unusable, because Snowflake is our primary data store. This is somehow related to Snowflake + dbt, but I could not narrow it down besides stating the obvious, that the Java code is trying to merge an equal object. Any help would be appreciated, because this is blocks our prod env. The following error occurs: In the Frontend UI:

An unknown error occurred. (code 500)

Failed to load results! An unexpected error occurred.

And in the GMS logs, it looks like this:

Copy code

06:42:03.642 [Thread-484] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
    at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
    at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
    at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$3(DataLoaderHelper.java:272)
    at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
    at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
    at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1588)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    ... 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Datasets
    at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:146)
    at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1585)
    ... 2 common frames omitted
Caused by: java.lang.IllegalStateException: Duplicate key EntityAspectIdentifier(urn=urn:li:dataset:(urn:li:dataPlatform:dbt,XXXXXXXXXXXXX.YYYYYYYYYYYYYY.ZZZZZZZZZZZZZ,PROD), aspect=upstreamLineage, version=0) (attempted merging values com.linkedin.metadata.entity.EntityAspect@f613625b and com.linkedin.metadata.entity.EntityAspect@f613625b)
    at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
    at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
    at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:263)
    at com.linkedin.metadata.entity.EntityService.getEnvelopedAspects(EntityService.java:1826)
    at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:379)
    at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:333)
    at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:289)
    at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:109)
    at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:130)
    ... 3 common frames omitted
06:42:03.645 [Thread-421] ERROR c.datahub.graphql.GraphQLController:99 - Errors while executing graphQL query: "query getSearchResultsForMultiple($input: SearchAcrossEntitiesInput!) {\n  searchAcrossEntitie
...