gray-ghost-82678
02/09/2023, 2:54 AMthousands-bird-50049
02/09/2023, 7:19 AMbest-wire-59738
02/09/2023, 9:16 AMmagnificent-lock-58916
02/09/2023, 9:32 AMrhythmic-quill-75064
02/09/2023, 10:10 AMdatahub-datahub-upgrade-job
:
APPLICATION FAILED TO START Description:
Field kafkaHealthChecker in com.linkedin.gms.factory.kafka.DataHubKafkaEventProducerFactory required a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' that could not be found.
The injection point has the following annotations:
@javax.inject.Inject() @javax.inject.Named(value="noCodeUpgrade") Action: Consider defining a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' in your configuration.
chilly-potato-57465
02/09/2023, 10:36 AM{
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.active_customer_ltv,PROD)") {
properties{name},
editableProperties{description},
type,
platform{name},
ownership{owners{associatedUrn}},
tags{tags{tag{urn}}},
glossaryTerms{terms{term{urn}}},
domain{domain{urn}}
}
}
strong-kite-83354
02/09/2023, 11:33 AMquery my_query($query_string:String!){
search(input: { type: DATASET, query: $query_string, start: 0, count: 100 }) {
start
count
total
searchResults {
entity {
urn
type
...on Dataset {
name
properties {customProperties {key
value
}}
}
}
}
}
}
bumpy-pharmacist-66525
02/09/2023, 2:58 PMExplore All
button from the homepage), if I try to use the advanced filter, some of the options like filtering on tags does not work. It seems that the API call which is supposed to populate the dropdown is not returning anything. However, it does seem like the basic
filtering works (the issue is related to the API call which is supposed to populate the dropdown menus).
Does anyone have an idea of what is happening and/or have an idea of how to fix it?
Here is a step by step set of instructions on how to reproduce the issue (start on the homepage of DataHub):
1. Select Explore All
2. Under the 'Filter' column, select Advanced
3. Select Add Filter
4. Select Tag
5. Search for a tag which exists in your datahub instance (in my case, I always get 'no data')faint-actor-78390
02/09/2023, 3:32 PMstocky-apple-7404
02/09/2023, 6:29 PMmysterious-motorcycle-80650
02/09/2023, 6:52 PMwooden-hamburger-59537
02/09/2023, 7:04 PMkubectl get pods -n my-datahub
NAME READY STATUS RESTARTS AGE
my-acryl-datahub-actions-7f7dbcb7cb-jwlm7 0/1 CrashLoopBackOff 126 (2m39s ago) 18h
my-cp-schema-registry-5cbf4478f-2xgnt 2/2 Running 0 51m
my-datahub-frontend-795fb7dd7d-qj7p9 1/1 Running 0 52m
my-datahub-gms-7466d54b7-5hwxz 0/1 CrashLoopBackOff 194 (4m55s ago) 17h
kubectl logs my-datahub-gms-7466d54b7-5hwxz -n my-datahub
+ echo
+ grep -q ://
+ NEO4J_HOST=http://
+ [[ ! -z datahubes ]]
+ [[ -z '' ]]
++ base64 --wrap 0
++ echo -ne 'username:password'
+ AUTH_TOKEN=username:password
+ ELASTICSEARCH_AUTH_HEADER='Authorization:Basic username:password'
+ [[ -z Authorization:Basic username:password ]]
+ [[ true == true ]]
+ ELASTICSEARCH_PROTOCOL=https
+ WAIT_FOR_EBEAN=
+ [[ '' != true ]]
+ [[ '' == ebean ]]
+ [[ -z '' ]]
+ WAIT_FOR_EBEAN=' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxxxxx.us-east-1.rds.amazonaws.com:3306> '
+ WAIT_FOR_CASSANDRA=
+ [[ '' == cassandra ]]
+ WAIT_FOR_KAFKA=
+ [[ '' != true ]]
++ echo <http://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096|b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
++ sed 's/,/ -wait tcp:\/\//g'
+ WAIT_FOR_KAFKA=' -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> '
+ WAIT_FOR_NEO4J=
+ [[ elasticsearch != elasticsearch ]]
+ OTEL_AGENT=
+ [[ '' == true ]]
+ PROMETHEUS_AGENT=
+ [[ true == true ]]
+ PROMETHEUS_AGENT='-javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml '
+ auth_resource_dir=/etc/datahub/plugins/auth/resources
+ CLASSES_DIR=
+ [[ '' == true ]]
+ COMMON='
-wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -timeout 240s java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war'
+ [[ false != true ]]
+ exec dockerize -wait <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> -wait-http-header 'Authorization:Basic ZGF0YWh1YmVzOlhjTyEyNlNPKmI6VThQOmxyLTJnTjZOZDYwNXQ3PU0rK2l7PA==' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -timeout 240s java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
2023/02/09 18:31:19 Waiting for: <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
2023/02/09 18:31:19 Waiting for: <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
2023/02/09 18:31:19 Waiting for: <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Waiting for: <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Waiting for: <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:31:19 Connected to <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Connected to <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
2023/02/09 18:31:19 Connected to <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:20 Received 200 from <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
2023/02/09 18:31:20 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:31:21 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
....... lots of log look like this .................
2023/02/09 18:33:27 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:28 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:29 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:30 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:31 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:32 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:18 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:19 Timeout after 4m0s waiting on dependencies to become available: [<https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>]
so what is wrong with it?glamorous-elephant-17130
02/09/2023, 7:27 PMglamorous-elephant-17130
02/09/2023, 7:34 PMglamorous-elephant-17130
02/09/2023, 7:34 PMquaint-barista-82836
02/09/2023, 7:34 PMquaint-barista-82836
02/09/2023, 11:03 PM"Anchor_Item"
) FROM project_name-thd
.FBT.FBT_Diff
] (Background on this error at: https://sqlalche.me/e/14/4xp6)average-dinner-25106
02/10/2023, 4:34 AMastonishing-cartoon-6079
02/10/2023, 5:59 AMsearchAcrossEntities
We looked through related code and found that com.linkedin.metadata.search.cache.CacheableSearcher#getSearchResults
is the root case.
just like its comment, it walks through all over the index from the beginning even though from a big pageNum. we are confused about this logic, I don't know what's side effect if using elasticSearch paging interface directly.
/**
* Get search results corresponding to the input "from" and "size"
* It goes through batches, starting from the beginning, until we get enough results to return
* This let's us have batches that return a variable number of results (we have no idea which batch the "from" "size" page corresponds to)
*/
public SearchResult getSearchResults(int from, int size) {
try (Timer.Context ignored = MetricUtils.timer(this.getClass(), "getSearchResults").time()) {
int resultsSoFar = 0;
int batchId = 0;
boolean foundStart = false;
List<SearchEntity> resultEntities = new ArrayList<>();
SearchResult batchedResult;
// Use do-while to make sure we run at least one batch to fetch metadata
do {
batchedResult = getBatch(batchId);
int currentBatchSize = batchedResult.getEntities().size();
// If the number of results in this batch is 0, no need to continue
if (currentBatchSize == 0) {
break;
}
if (resultsSoFar + currentBatchSize > from) {
int startInBatch = foundStart ? 0 : from - resultsSoFar;
int endInBatch = Math.min(currentBatchSize, startInBatch + size - resultEntities.size());
resultEntities.addAll(batchedResult.getEntities().subList(startInBatch, endInBatch));
foundStart = true;
}
// If current batch is smaller than the requested batch size, the next batch will return empty.
if (currentBatchSize < batchSize) {
break;
}
resultsSoFar += currentBatchSize;
batchId++;
} while (resultsSoFar < from + size);
return new SearchResult().setEntities(new SearchEntityArray(resultEntities))
.setMetadata(batchedResult.getMetadata())
.setFrom(from)
.setPageSize(size)
.setNumEntities(batchedResult.getNumEntities());
}
}
bland-orange-13353
02/10/2023, 7:14 AMelegant-article-21703
02/10/2023, 11:25 AMv0.10.0
or is something else but, when I updated the version using the repo I realised that Datahub is not loading the glossaryterms
, only the root nodes are shown. Children property of the glossary nodes is always empty
{
"status": "DB Updated. 1290 Glossary Terms created or updated. 2 Dashboard created or updated. 1 IA Models created or updated."
}
Someone had the same issue? Additionally, I downgraded to the previous version (v0.9.6.1
) but still facing the same problem.
Thank you all in advance!!best-wire-59738
02/10/2023, 1:19 PMsearchAcrossLineage
GraphQL query was made and it is giving me 503 error.
Also found the same issue reported few days back in the channel : https://datahubspace.slack.com/archives/C029A3M079U/p1673535713529589
We are currently on datahub v0.9.6.1green-hamburger-3800
02/10/2023, 2:49 PMCreate a new GlossaryNode. Returns the urn of the newly created GlossaryNode. If a node with the provided ID already exists, it will be overwritten.
That is stated both for the createGlossaryNode
and createGlossaryTerm
mutations.
But when I try to actually overwrite that I get the following error:
14:42:31.364 [ForkJoinPool.commonPool-worker-49] ERROR c.l.d.g.r.g.CreateGlossaryNodeResolver:71 - Failed to create GlossaryNode with id: b4940ce5-ef8b-409f-a9c8-00588fda73a8, name: Status: This Glossary Node already exists!
silly-dog-87292
02/10/2023, 8:08 PMpowerful-memory-77948
02/10/2023, 8:33 PM./gradlew :metadata-jobs:mae-consumer-job:bootRun
Error :
> Task :metadata-jobs:mae-consumer-job:bootRun FAILED
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.5.12)
2023-02-10 12:15:53,225 [main] INFO org.eclipse.jetty.util.log - Logging initialized @1412ms to org.eclipse.jetty.util.log.Slf4jLog
2023-02-10 12:15:53,306 [main] INFO org.eclipse.jetty.server.Server - jetty-9.4.45.v20220203; built: 2022-02-03T09:14:34.105Z; git: 4a0c91c0be53805e3fcffdcdcc9587d5301863db; jvm 11.0.18+0
2023-02-10 12:15:53,319 [main] INFO o.e.j.s.h.ContextHandler.application - Initializing Spring embedded WebApplicationContext
2023-02-10 12:15:53,411 [main] INFO org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2023-02-10 12:15:53,411 [main] INFO org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2023-02-10 12:15:53,412 [main] INFO org.eclipse.jetty.server.session - node0 Scavenging every 660000ms
2023-02-10 12:15:53,415 [main] INFO o.e.j.server.handler.ContextHandler - Started o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],AVAILABLE}
2023-02-10 12:15:53,415 [main] INFO org.eclipse.jetty.server.Server - Started @1603ms
2023-02-10 12:15:53,482 [main] INFO org.eclipse.jetty.server.session - node0 Stopped scavenging
2023-02-10 12:15:53,483 [main] INFO o.e.j.server.handler.ContextHandler - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],STOPPED}
ERROR LoggingFailureAnalysisReporter
***************************
APPLICATION FAILED TO START
***************************
Description:
Parameter 0 of constructor in com.linkedin.metadata.kafka.boot.DataHubUpgradeKafkaListener required a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' that could not be found.
Action:
Consider defining a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' in your configuration.
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':metadata-jobs:mae-consumer-job:bootRun'.
silly-angle-91497
02/10/2023, 10:02 PM2023-02-10 21:53:46,052 [qtp447981768-17] WARN c.d.a.a.AuthenticatorChain:80 - Authentication chain failed to resolve a valid authentication. Errors: [(com.datahub.authentication.authenticator.DataHubSystemAuthenticator,Failed to authenticate inbound request: Authorization header is missing Authorization header.), (com.datahub.authentication.authenticator.DataHubTokenAuthenticator,Failed to authenticate inbound request: Request is missing 'Authorization' header.)]
The datahub site seems to be up and running. But when trying to search for entities I keep seeing a notification "An unknown error occurred. (Code 500). All the pods seem to be running and this warning is all I keep seeing.chilly-ability-77706
02/11/2023, 1:38 AMchilly-ability-77706
02/11/2023, 1:38 AMpowerful-cat-68806
02/12/2023, 10:34 AMpgSQL
) from DH deployment
I’m able to connect from my local & from a connector(EC2) in the same VPC. I’ve also double-checked the PWD from k8s secret I’m using
Following the error from the pod:
Failed to pull image "acryldata/datahub-postgres-setup:v0.9.6.1": rpc error: code = Unknown desc = Error response from daemon: manifest for acryldata/datahub-postgres-setup:v0.9.6.1 not found: manifest unknown: manifest unknown
I’ve tried to set the version to v0.9.6.1rc4
as described here. No change
Any idea?kind-sunset-55628
02/12/2023, 2:03 PM