DataHub #troubleshoot

better-fireman-33387

05/03/2023, 6:25 AM

Hi, we set datahub to work with our managed elastic and then decided to move it back to work with prerequisites elastic. After deploying both prerequisites & datahub everything works fine and I see all pods are running but after running the reindexes job ui is still empty and it says “no metadata found” though we do have entities and ingestion already done and saved in DB. We are using helm to deploy and the only error I see is in the Elastic logs attached in the first comment

✅ 1

miniature-lighter-59048

05/03/2023, 7:33 AM

Hi, I did a trivy scan on latest image acryldata/datahub-ingestion:v0.10.2.3rc4 and found 19 Critical and 250 High vulnerabilities acryldata/datahub-ingestion:v0.10.2.3rc4 (debian 11.6) ====================================================== Total: 269 (HIGH: 250, CRITICAL: 19) will it be possible to look into these?

scanresult.txt

wide-ghost-47822

05/03/2023, 7:33 AM

Hi, I’d like to fetch Validations result from Datahub. I’ve played with OpenApi and Graphql but couldn’t find something related with this. Do OpenApi and Graphql expose this information? If they don’t can you show me a reference which I can be able to do this with other approaches Here is the information I want to fetch with API:

wide-afternoon-79955

05/03/2023, 12:52 PM

Hi All, We are on Datahub v0.10.1 with opensearch (

OpenSearch 1.3

) and we are facing 500s while downloading a lineage data for a Swnoflake table. Error stack trace is in thread.

✅ 1

magnificent-lawyer-97772

05/03/2023, 1:47 PM

Hi, we are deploying version 0.9.6.2 with helm charts. Is there a way of changing the log level of mce-consumer with an environmental variable or a parameter from the default INFO to WARN?

best-umbrella-88325

05/03/2023, 3:25 PM

Hey all. We observe there are a few critical vulnerabilities in the Datahub actions image and other few images. However, we don't see any Java code in the actions project. Can someone help us with identifying how we can help in remediating these, or if someone from the Datahub team can take this up? Thanks https://artifacthub.io/packages/helm/datahub/datahub?modal=security-report

📖 2

🔍 1

elegant-salesmen-99143

05/03/2023, 4:14 PM

Hi. Something's wrong with our builn-it Analytics tab - Screenshot 1 is all I and other users can see in it - a couple of charts and a field to chose domain, and this is it, nothing else shows. I've refreshed multiple times, and it's been like that for a day at least, maybe more. The privileges to view Analytics are all set. It used to work fine before, but a few day ago we upgraded from 0.9.6.1 to 10.1 - it might have broke after that. but we also have stage datahub version, and it works fine there (screenshot 2), althout it was also upgraded to 10.1 Any idea what I should look into to fix this? Out Datuhub is deployed via Kubernetes

lively-dusk-19162

05/03/2023, 5:22 PM

Hi team, I am trying to deploy datahub using ./gradlew quickstartDebug command. I am getting the following error: Datahub-gms is running but not healthy Datahub-upgrade exited with an error I am running on version v0.9.2 and also tried running with v0.10.0 and v0.9.6 Can anyone please help me on this?

📖 1

🔍 1

bland-orange-13353

05/03/2023, 6:46 PM

This message was deleted.

✅ 1

rich-state-73859

05/03/2023, 6:55 PM

Hi all, I’m using datahub v0.10.2 with some customized changes. In the lineage tab, the datasets show up correctly, but in the lineage visualization, some lineages are missing. Any idea what I should look into?

🔍 1

📖 1

ripe-oxygen-26489

05/03/2023, 7:05 PM

Hello. I’m having trouble connecting Mode as a source. I’m using the documentation, but I keep getting strange errors. I’m not sure what mistakes I might be making in the configuration. Is there a more complete example I could use to match?

📖 1

🔍 1

fast-advantage-32018

05/03/2023, 9:07 PM

I am trying to run Datahub through Docker. I am running into the Redshift

'str' object is not callable

error. I thought I had downgraded the Datahub CLI version to

.10.1.2

version, but am still running into the error. Is there something I need to change in the docker-compose.yml file? Any help would be appreciated.

📖 1

🔍 1

creamy-ram-28134

05/03/2023, 9:22 PM

Hi TEam - I am getting this error while deploying on Kubernetes

✅ 1

creamy-ram-28134

05/03/2023, 9:22 PM

Copy code

[root@adkube06 ~]# kubectl logs -f datahub-datahub-gms-b4c458457-qxzpb -n gopikab
2023/05/03 21:19:19 Waiting for: <tcp://prerequisites-mysql:3306>
2023/05/03 21:19:19 Waiting for: <tcp://prerequisites-kafka:9092>
2023/05/03 21:19:19 Waiting for: <http://elasticsearch-master:9200>
2023/05/03 21:19:19 Waiting for: http:
2023/05/03 21:19:19 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
2023/05/03 21:19:19 Connected to <tcp://prerequisites-mysql:3306>
2023/05/03 21:19:19 Connected to <tcp://prerequisites-kafka:9092>
2023/05/03 21:19:19 Received 200 from <http://elasticsearch-master:9200>
2023/05/03 21:19:20 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
2023/05/03 21:19:21 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
2023/05/03 21:19:22 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
2023/05/03 21:19:23 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s

creamy-ram-28134

05/03/2023, 9:23 PM

Why is this error occuring and why is there an empty http ?

creamy-ram-28134

05/03/2023, 9:24 PM

Also getting this error -e prerrequisites cp schema issue:

Copy code

pod/prerequisites-cp-schema-registry-6c69f9f665-mbq9k   1/2     CrashLoopBackOff   25 (3m7s ago)   106m
[root@adkube06 ~]# kubectl logs -f pod/prerequisites-cp-schema-registry-6c69f9f665-mbq9k -n gopikab
Defaulted container "prometheus-jmx-exporter" out of: prometheus-jmx-exporter, cp-schema-registry-server
Unrecognized VM option 'UseCGroupMemoryLimitForHeap'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

few-sunset-43876

05/04/2023, 4:18 AM

Hi folks, I'm trying to ingest the metadata using BQ connector. My datahub version is v0.10.0 I got the error:

Copy code

"Cannot handle <project-id>.<dataset>.<table-id>$__PARTITIONS_SUMMARY__ - poorly formatted table name, contains ['$']"

I see the following commit has fixed it. But it still happens to me. Can anyone help? Thanks! https://github.com/datahub-project/datahub/pull/3842

📖 1

🔍 1

adorable-megabyte-63781

05/04/2023, 6:10 AM

Hi All, I was trying to build the datahub frontend but getting failure message as below. ./gradlew datahub frontenddist Configuration on demand is an incubating feature. FAILURE: Build failed with an exception. * What went wrong: A problem occurred configuring root project 'datahub'.

Could not resolve all artifacts for configuration ':classpath'.

> Could not resolve com.linkedin.pegasusgradle plugins29.22.16. Required by: project : > Could not resolve com.linkedin.pegasusgradle plugins29.22.16. > Could not get resource 'https://packages.confluent.io/maven/com/linkedin/pegasus/gradle-plugins/29.22.16/gradle-plugins-29.22.16.pom'. > Could not GET 'https://packages.confluent.io/maven/com/linkedin/pegasus/gradle-plugins/29.22.16/gradle-plugins-29.22.16.pom'. > Connection reset > Could not resolve com.linkedin.pegasusgradle plugins29.22.16. > Could not get resource 'https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/29.22.16/gradle-plugins-29.22.16.pom'. > Could not GET 'https://linkedin.jfrog.io/artifactory/open-source/com/linkedin/pegasus/gradle-plugins/29.22.16/gradle-plugins-29.22.16.pom'. > Connection reset

📖 1

🔍 1

✅ 1

nice-helmet-40615

05/04/2023, 11:42 AM

Hi all. I have a problem restoring Search and Graph Indices from the local database. Steps are: - install datahub (0.10.2) via the own helm chart - restore Postgres data (about 6 million rows) - run datahub-upgrade/datahub-upgrade.sh -u RestoreIndices (It took several hours without errors in logs) After that, I see some data in UI, but not all of it, and they keep adding slowly. Also, I see a big lag (about 3 million items) for generic-mae-consumer-job-client for MetadataChangeLog_Versioned Kafka topic. The mae-consumer logs do not contain any errors (elasticsearch too), but look like slow-processing data What could be the problem? Thanks!

many-glass-1784

05/04/2023, 3:03 PM

Hi all, After changing the key of the Dataset entity I am running into issues. After deleting all datasets, deploying the changes and inserting the datasets again with the new key format I get some errors on the home screen of the application. Even when I don't insert new datasets I get the same errors. The same happens for a custom entity we made, dataproduct:

Copy code

Caused by: java.lang.RuntimeException: Failed to batch load data products
	at com.linkedin.datahub.graphql.types.dataproduct.DataProductType.batchLoad(DataProductType.java:81)
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$194(GmsGraphQLEngine.java:1684)
	... 2 common frames omitted
Caused by: java.lang.IllegalArgumentException: Failed to convert urn to entity key: urns parts and key fields do not have same length for urn:li:dataProduct:be.publiq.vrijetijdsparticipatie-publiq-uit-locaties
	at com.linkedin.metadata.utils.EntityKeyUtils.convertUrnToEntityKey(EntityKeyUtils.java:95)
	at com.linkedin.metadata.entity.EntityService.getKeyEnvelopedAspect(EntityService.java:1891)
	at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:393)
	at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:336)
	at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:292)
	at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:111)
	at com.linkedin.datahub.graphql.types.dataproduct.DataProductType.batchLoad(DataProductType.java:63)
	... 3 common frames omitted

After some looking around I found that there was some data in a hidden ElasticSearch index,

.ds-datahub_usage_event-000001

, that still seemed to contain rows with URNs that refer to data that was deleted. it looks like the recommendations/suggestions section on the main page uses this index and as such gets some data that contains entity URNs in an old and invalid format which causes exceptions. When I use the ElasticSearch APIs to clear this index, the error goes away, and when clicking around in the application it gets filled again, this time with data with valid URNs. Is it normal that this index isn't kept up to data with the normal entity deletes (as the database and non-hidden ElasticSearch indices are actually cleared)? If it is normal, is there a better way to clear the indices? Locally the clear of the hidden index works, but on our actual deployed environment this only clears some of the errors, but even with the deletes and the cleared index it still seems to find some invalid data somewhere?

📖 1

🔍 1

big-ocean-9800

05/04/2023, 6:15 PM

Hey folks! My team created a custom ingestion source for CockroachDB (we are planning on contributing this change back to the community in the next couple of weeks). This Source is almost exactly the same as the Postgres Source . Here’s a link to the custom source in a GIST. This was working for us up until we upgraded from

v0.8.41

v0.9.5

. The main issue we are seeing is that browse paths are not working as expected. Before, the browse path was properly being populated based on instance name and database name, now it’s being truncated to just the platform name after we run an ingest. Here’s an example of two different versions of the

browsePaths

aspect for the same URN before and after the upgrade:

Copy code

```
-[ RECORD 1 ]--+--------------------------------------------------------------------------------------------------------------------------------------------------------
urn            | urn:li:dataset:(urn:li:dataPlatform:CockroachDB,test-instance.test-database.test-schema.test-table,PROD)
aspect         | browsePaths
version        | 0
metadata       | {"paths":["/prod/cockroachdb"]}
systemmetadata | {}
createdon      | 2023-01-05 17:35:19.596
createdby      | urn:li:corpuser:__datahub_system
createdfor     |
-[ RECORD 2 ]--+--------------------------------------------------------------------------------------------------------------------------------------------------------
urn            | urn:li:dataset:(urn:li:dataPlatform:CockroachDB,test-instance.test-database.test-schema.test-table,PROD)
aspect         | browsePaths
version        | 1
metadata       | {"paths":["/prod/cockroachdb/test-instance/test-database/test-schema/test-table"]}
systemmetadata | {}
createdon      | 2022-07-29 22:08:16.392
createdby      | urn:li:corpuser:__datahub_system
createdfor     |
```

Here you can see that the browse path is no longer being populated the same way, even though the ingestion source code has not changed. Does anyone know if we are missing something around this browsePaths issue? My understanding is that nothing should have changed around this after the version upgrade. I’ve been looking through the ingestion source code but haven’t found any smoking guns yet.

📖 1

🔍 1

creamy-ram-28134

05/04/2023, 7:04 PM

Hi ALL - We are trying to deploy datahub on k8s - The gms component is failing with this error

Copy code

2023-05-04 17:43:27,138 [pool-19-thread-1] ERROR c.l.d.g.a.service.AnalyticsService:264 - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]

My question is who or what needs to create those indexes? Is it one of the automatic jobs?

lively-dusk-19162

05/04/2023, 7:47 PM

Hi all, The following are the steps followed to deploy datahub locally during development. Using mac m1 pro machine and i am trying to deploy datahub version v0.10.2. 1. Clone the datahub github code 2. Added certificates because of corp proxies and zscaler in docker files. 3. Ran ./gradlew quickstartDebug command for deploying locally 4. I got the following error inside elasticsearch-setup Get http://elasticsearch:9200 EOF sleeping 1s Tried few things to resolve that but couldn’t able to do so. 1. tried ping elasticsearch from elasticsearch-setup and vicecersa and i am able to ping. 2. nuke all containers and images and kept running again. Could anyone please help me to resolve this?

early-hydrogen-27542

05/04/2023, 8:15 PM

👋 folks. Any idea why the Platform card on the homepage would show entities under the

dbt platform▾

...when there aren't any

under search▾

? When I inspect the homepage, it's actually

pulling no dbt entities▾

. Why would the Platform card still appear?

📖 1

🔍 1

bland-orange-13353

05/04/2023, 11:23 PM

This message was deleted.

✅ 1

important-intern-48298

05/05/2023, 2:25 AM

Hi folks, I am trying to build the Datahub repo by following the instructions found on the Datahub website. Command i have executed:

Copy code

./gradlew build

Getting the following error: FAILURE: Build failed with an exception. * What went wrong: Execution failed for task 'datahub web reactyarnGenerate'.

Process 'command '/home/Documents/Repositories/datahub/datahub-web-react/.gradle/yarn/yarn-v1.22.0/bin/yarn'' finished with non-zero exit value 1

So far I have tried clearing the yarn cache but still getting the same error. I would really appreciate it if someone can assist. Thanks!

bland-orange-13353

05/05/2023, 2:27 AM

This message was deleted.

adamant-furniture-37835

05/05/2023, 7:31 AM

Hi, I am trying My Views functionality on version 0.10.2. I have created a View that has a filter to include 2 platforms, saved it and made it default. Now when I browse the datasets, I can see that the View is selected in the dropdown section in top header but it has no effect on the displayed results i.e. results aren't filtered. When i inspect the backend query sent by UI, it doesn't include any specifics to this view or filter. Am i missing something here ?

blue-microphone-24514

05/05/2023, 7:42 AM

Hi. When I enable

metadata_service_authentication

, all auth (user/pwd, SSO) stop working with a

Provided credentials do not match known system client id & client secret

in the logs ?

🔍 1

✅ 2

📖 1

rapid-forest-41223

05/05/2023, 8:27 AM

Hi! I am having hard times trying to enable the autocomplete search for a new entity. The

searchAcrossEntities

does work and the

"enableAutocomplete": true

is set for the field. I do not see the logline

Autocompleting query entityName

for the new entity though in the

gms.debug.log

after the

autoCompleteForMultiple

query is fired.

📖 1

🔍 1