famous-quill-82626
02/13/2023, 1:15 AMquery listUsers($input: ListUsersInput!) {
listUsers(input: $input) {
start
count
total
users {
urn
username
isNativeUser
info {
active
displayName
title
firstName
lastName
fullName
email
__typename
}
editableProperties {
displayName
pictureLink
teams
title
skills
__typename
}
status
roles: relationships(
input: {types: ["IsMemberOfRole"], direction: OUTGOING, start: 0}
) {
start
count
total
relationships {
entity {
... on DataHubRole {
urn
type
name
relationships(input: {types: ["IsMemberOfRole"], direction: INCOMING}) {
start
count
total
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
}
---> result:
{
"data": {
"listUsers": {
"start": 0,
"count": 25,
"total": 0,
"users": [],
"__typename": "ListUsersResult"
}
},
"extensions": {}
}
Is this query attempting to get the User list from the DataHub database?
..or if not, what is it querying?
Could database access be affecting the returned result?
Thanks,
Petebest-umbrella-88325
02/13/2023, 9:55 AMpurple-printer-15193
02/13/2023, 11:43 AMdatahub delete --hard
for the dbt platform, however the homepage is displaying an incorrect dbt stats. Am I missing anything else? What else do I need to delete?gentle-portugal-21014
02/13/2023, 12:09 PMat com.linkedin.metadata.search.client.CachingEntitySearchService.search(CachingEntitySearchService.java:54)
at com.linkedin.metadata.search.aggregator.AllEntitiesSearchAggregator.lambda$getSearchResultsForEachEntity$2(AllEntitiesSearchAggregator.java:161)
at com.linkedin.metadata.utils.ConcurrencyUtils.lambda$transformAndCollectAsync$0(ConcurrencyUtils.java:24)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch:9200>], URI [/glossarytermindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
error={"root_cause":[{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"glossarytermindex_v2","node":"hYLiXc5_SCa0D2KGNC172Q","reason":{"type":"query_shard_exception","reason":"failed to create query: Can't parse boolean value [architektura], expected [true] or [false]","index_uuid":"CDpqvb1FRB2F8rVTA53BVA","index":"glossarytermindex_v2","caused_by":{"type":"illegal_argument_exception","reason":"Can't parse boolean value [architektura], expected [true] or [false]"}}}]} status=400
(there are more lines in the stack trace, but I believe that the most important lines are included this way). The string "architektura" appearing in the exception record at the bottom was my searched term. The important point is that we extended the Glossary Term entity by adding additional aspects with new attributes (as discussed elsewhere) including getting those new attributes supported in searching and filtering (I can share the PDL files for the added aspects here if necessary, of course). This happens on a completely new DataHub deployment of a forked repository (our last resync/merge includes commit a164bdab) with our modifications, i.e. both MySQL and Elastic Search databases were created from scratch (we used the option without Neo4J) and Glossary Terms were created anew after this deployment, i.e. there were no changes in the metamodel since the Elastic Search index was created. I tried restoring indices using the POST method provided within the GMS API for all URNs found in the MySQL database (as discussed elsewhere, I couldn't use the datahub-upgrade image due to our metamodel extensions), but that didn't help. Any help would be very appreciated...green-hamburger-3800
02/13/2023, 3:53 PMStarburst
platform just to have the data a bit more user friendly here and wondering how we could do that... We tested overwriting the code to pass starburst
as the platform parameter from the code default itself after creating the starburst platform and it worked, but not sure how we could collaborate with the datahub codebase to make it available...
I'd guess the easier way it's having a StarburstSource
that would extend the TrinoSource
and do nothing but change that parameter... but I'm not sure it's a good idea! Thoughts?!
c.c @best-notebook-58252creamy-machine-95935
02/13/2023, 5:01 PMcolossal-easter-99672
02/13/2023, 5:24 PMquery get_ddl_history {
versionedDataset(
urn: "urn:li:dataset:(urn:li:dataPlatform:clickhouse,analytics.profit.debit,PROD)"
versionStamp: "viewProperties:0"
) {
viewProperties {
logic
}
}
}
white-sandwich-70716
02/13/2023, 5:25 PMsalmon-jordan-53958
02/13/2023, 5:56 PMflat-match-62670
02/13/2023, 8:44 PMdatahub delete --entity_type glossaryTerm --query "*" --soft -f
I am receiving this traceback in return:
[2023-02-13 12:37:43,117] INFO {datahub.cli.delete_cli:326} - Filter matched glossaryTerm entities of None. Sample: []
No urns to delete. Maybe you want to change entity_type=glossaryTerm or platform=None to be something different?
Took 1.477 seconds to hard delete 0 versioned rows and 0 timeseries aspect rows for 0 entities.
It appears that the datahub-gms pod is unable to locate any of the glossary urns to delete. I am able to successfully ingest via the CLI so I know my environment variables are correct. Not quite sure what is happening but I would love to be able to mass delete all the glossary terms/nodes. Any help appreciated! Thanksenough-monitor-24292
02/14/2023, 7:22 AMlively-spring-5482
02/14/2023, 1:31 PM502 Bad Gateway
error message.
Steps to reproduce (updated Chrome browser, MacOS/Win):
1. Login using SSO to DataHub - success
2. Play around for a while [optional]
3. Sign out from the UI
4. Once back on the login screen - re-login using SSO.
Step #4 gets us the above mentioned 502 error.
To us, the probable scenario looks like an internal error in DataHub. Weβve noticed PLAY_SESSION
cookie holding the user currently logged in is not removed after the sign out. The browser still sends this cookie even though the session has already expired on the server. The latter refrains from responding to the request sent with an inactive cookie leading the load balancer to respond with an 502 error message.
Any suggestions on how we could fix the problem?
Thanks in advance!ripe-tailor-61058
02/14/2023, 4:00 PMripe-tailor-61058
02/14/2023, 4:36 PMripe-tailor-61058
02/14/2023, 4:38 PMwitty-actor-87329
02/14/2023, 8:33 PM2023-02-14 20:25:50,405 [main] WARN c.l.metadata.entity.EntityService:798 - Unable to produce legacy MAE, entity may not have legacy Snapshot schema.
java.lang.UnsupportedOperationException: Failed to find Typeref schema associated with Config-based Entity
Configs used for gms:
datahub-gms:
container_name: datahub-gms
environment:
- DATAHUB_UPGRADE_HISTORY_KAFKA_CONSUMER_GROUP_ID=generic-duhe-consumer-job-client-gms
- EBEAN_DATASOURCE_USERNAME=xyz
- EBEAN_DATASOURCE_PASSWORD=xyz
- EBEAN_DATASOURCE_HOST=xyz
- EBEAN_DATASOURCE_URL=jdbc:<postgresql://xyz>
- EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
- KAFKA_BOOTSTRAP_SERVER=broker:29092
- KAFKA_SCHEMAREGISTRY_URL=<http://schema-registry:8081>
- ELASTICSEARCH_HOST=elasticsearch
- ELASTICSEARCH_PORT=9200
- ES_BULK_REFRESH_POLICY=WAIT_UNTIL
- ELASTICSEARCH_INDEX_BUILDER_SETTINGS_REINDEX=true
- ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX=true
- NEO4J_HOST=<http://neo4j:7474>
- NEO4J_URI=<bolt://neo4j>
- NEO4J_USERNAME=neo4j
- NEO4J_PASSWORD=datahub
- JAVA_OPTS=-Xms1g -Xmx1g
- GRAPH_SERVICE_DIFF_MODE_ENABLED=true
- GRAPH_SERVICE_IMPL=neo4j
- ENTITY_REGISTRY_CONFIG_PATH=/datahub/datahub-gms/resources/entity-registry.yml
- ENTITY_SERVICE_ENABLE_RETENTION=true
- MAE_CONSUMER_ENABLED=true
- MCE_CONSUMER_ENABLED=true
- PE_CONSUMER_ENABLED=true
- UI_INGESTION_ENABLED=true
- METADATA_SERVICE_AUTH_ENABLED=false
hostname: datahub-gms
image: ${DATAHUB_GMS_IMAGE:-linkedin/datahub-gms}:${DATAHUB_VERSION:-head}
ports:
- ${DATAHUB_MAPPED_GMS_PORT:-8080}:8080
Can anyone help me on this? Thankssalmon-jordan-53958
02/14/2023, 9:05 PMrich-policeman-92383
02/15/2023, 4:39 AMgray-cpu-75769
02/15/2023, 4:49 AMbusy-analyst-35820
02/15/2023, 6:17 AMalert-fall-82501
02/15/2023, 6:35 AMglamorous-elephant-17130
02/15/2023, 12:34 PM<https://aws.amazon.com/blogs/big-data/part-1-deploy-datahub-using-aws-managed-services-and-ingest-metadata-from-aws-glue-and-amazon-redshift/>
Hey guys, I followed this document to setup datahub in my dev environment.
Any clue on how to change the default id password for the root user?hundreds-notebook-26128
02/15/2023, 8:53 PMdatahub docker quickstart --arch m1
The response I get is the following, even though I do have βdockerβ runningβ¦. in my case, I am using a dockerd/moby setup via RancherDesktopβ¦ is this maybe why DataHub is not detecting that my βDockerβ is actually running and operating normally, i.e. because it is RancherDesktop ?
Using architecture Architectures.m1
Docker doesn't seem to be running. Did you start it?
I just searched this channel and found others with similar problems, and a link to:
https://github.com/rancher-sandbox/rancher-desktop/issues/2534
Some of that thread SEEMS to apply to my situation, but I am not sure I can switch over to trying to use containerd/nerdctl vs. dockerd/moby as my Container Runtime, because that would likely affect every other container I traditionally runpowerful-cat-68806
02/16/2023, 9:00 AM404
error from nginx when trying to access from public
I can provide yaml file for the relevant pod - frontend/gmsfierce-garage-74290
02/16/2023, 10:40 AMmutation createDomain {
createDomain(input: { id: "urn:mynewdomain", name: "My New Domain", description: "An optional description" })
}
average-dinner-25106
02/16/2023, 11:18 AMancient-guitar-60671
02/16/2023, 4:47 PMhelpful-greece-26038
02/16/2023, 7:42 PMpowerful-telephone-2424
02/16/2023, 10:13 PMdocker/dev.sh
after pulling the latest datahub code and Iβm seeing this message repeatedly:
datahub-actions_1 | 2023/02/16 22:10:34 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:35 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:36 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:37 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:38 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:39 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:40 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
datahub-actions_1 | 2023/02/16 22:10:41 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: no such host. Sleeping 1s
Digging in further I found that the broker isnβt running and when I try to manually start its docker container, I see this error:
2023-02-16 14:04:18 [2023-02-16 22:04:18,885] INFO Session establishment complete on server zookeeper/172.18.0.3:2181, session id = 0x10000152e130003, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
2023-02-16 14:04:18 [2023-02-16 22:04:18,888] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
2023-02-16 14:04:18 [2023-02-16 22:04:18,942] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Feature ZK node at path: /feature does not exist (kafka.server.FinalizedFeatureChangeListener)
2023-02-16 14:04:18 [2023-02-16 22:04:18,949] INFO Cleared cache (kafka.server.FinalizedFeatureCache)
2023-02-16 14:04:19 [2023-02-16 22:04:19,068] INFO Cluster ID = 9_PboVE2QOad45hS_5Tn9w (kafka.server.KafkaServer)
2023-02-16 14:04:19 [2023-02-16 22:04:19,074] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
2023-02-16 14:04:19 kafka.common.InconsistentClusterIdException: The Cluster ID 9_PboVE2QOad45hS_5Tn9w doesn't match stored clusterId Some(XkdbYCWoRVadmGA-i2RwKw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
2023-02-16 14:04:19 at kafka.server.KafkaServer.startup(KafkaServer.scala:230)
2023-02-16 14:04:19 at kafka.Kafka$.main(Kafka.scala:109)
2023-02-16 14:04:19 at kafka.Kafka.main(Kafka.scala)
2023-02-16 14:04:19 [2023-02-16 22:04:19,075] INFO shutting down (kafka.server.KafkaServer)
2023-02-16 14:04:19 [2023-02-16 22:04:19,076] INFO [feature-zk-node-event-process-thread]: Shutting down (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
2023-02-16 14:04:19 [2023-02-16 22:04:19,077] INFO [feature-zk-node-event-process-thread]: Stopped (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
Any pointers on how to resolve this?rich-pager-68736
02/17/2023, 6:52 AMAn unknown error occurred. (code 500)
Failed to load results! An unexpected error occurred.
And in the GMS logs, it looks like this:
06:42:03.642 [Thread-484] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423)
at java.base/java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1144)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$3(DataLoaderHelper.java:272)
at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1588)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
... 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Datasets
at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:146)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$createDataLoader$183(GmsGraphQLEngine.java:1585)
... 2 common frames omitted
Caused by: java.lang.IllegalStateException: Duplicate key EntityAspectIdentifier(urn=urn:li:dataset:(urn:li:dataPlatform:dbt,XXXXXXXXXXXXX.YYYYYYYYYYYYYY.ZZZZZZZZZZZZZ,PROD), aspect=upstreamLineage, version=0) (attempted merging values com.linkedin.metadata.entity.EntityAspect@f613625b and com.linkedin.metadata.entity.EntityAspect@f613625b)
at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:263)
at com.linkedin.metadata.entity.EntityService.getEnvelopedAspects(EntityService.java:1826)
at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:379)
at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:333)
at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:289)
at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:109)
at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:130)
... 3 common frames omitted
06:42:03.645 [Thread-421] ERROR c.datahub.graphql.GraphQLController:99 - Errors while executing graphQL query: "query getSearchResultsForMultiple($input: SearchAcrossEntitiesInput!) {\n searchAcrossEntitie
...