Hi We see an intermittent exception in the UI. ```...
# ui
g
Hi We see an intermittent exception in the UI.
Copy code
Exception while fetching data (/dataset/downstreamLineage) : java.lang.RuntimeException: Failed to retrieve entities of type DownstreamEntityRelationships Exception while fetching data (/dataset/upstreamLineage) : java.lang.RuntimeException: Failed to retrieve entities of type UpstreamEntityRelationships
When browsning to a dataset, the exception pops up sometimes. Upon refreshing the page, the exception is gone and the dataset is shown. The exception originates from GMS, that it suddenly can’t connect to Neo4j.
Copy code
Response status 500, serviceErrorMessage: org.neo4j.driver.exceptions.ServiceUnavailableException: Connection to the database failed
The default connection properties seems to be reasonable, so we are a bit lost here. Anyone seen this error or can give me a lead?
The datahub version is 0.7.1, and the instances are running in AWS ECS.
GMS is rather quiet:
Copy code
WARNING: [0xf13cc4a2][bolt-318471] Fatal error occurred in the pipeline
Frontend:
Copy code
Exception while fetching data (/dataset/downstreamLineage) : java.lang.RuntimeException: Failed to retrieve entities of type DownstreamEntityRelationships","context":"default","exception":"java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DownstreamEntityRelationships
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DownstreamEntityRelationships
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$59(GmsGraphQLEngine.java:459)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	... 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Datasets
	at com.linkedin.datahub.graphql.types.lineage.DownstreamLineageType.batchLoad(DownstreamLineageType.java:43)
	at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$59(GmsGraphQLEngine.java:457)
	... 2 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load DownstreamLineage for entity urn:li:dataset:(urn:li:dataPlatform:<removed>))
	at com.linkedin.datahub.graphql.types.lineage.DownstreamLineageType.lambda$batchLoad$0(DownstreamLineageType.java:39)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
	at com.linkedin.datahub.graphql.types.lineage.DownstreamLineageType.batchLoad(DownstreamLineageType.java:41)
	... 3 common frames omitted
Caused by: com.linkedin.restli.client.RestLiResponseException: com.linkedin.restli.client.RestLiResponseException: Response status 500, serviceErrorMessage: org.neo4j.driver.exceptions.ServiceUnavailableException: Connection to the database failed
	at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:130)
	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseEntity(ResponseFutureImpl.java:173)
	at com.linkedin.lineage.client.Lineages.getLineage(Lineages.java:33)
	at com.linkedin.datahub.graphql.types.lineage.DownstreamLineageType.lambda$batchLoad$0(DownstreamLineageType.java:36)
	... 11 common frames omitted
Caused by: com.linkedin.restli.client.RestLiResponseException: RestException{_response=RestResponse[headers={Connection=keep-alive, content-length=13096, Date=Wed, 12 May 2021 12:35:21 GMT, server=envoy, x-envoy-upstream-service-time=92, x-restli-error-response=true, x-restli-protocol-version=2.0.0},cookies=[],status=500,entityLength=13096]} 
	at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:102)
	at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
...
	... 1 common frames omitted
Caused by: com.linkedin.r2.message.rest.RestException: Received error 500 from server for URI <https://gms:443/lineage>
	at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:76)
	... 4 common frames omitted
g
Hey @gifted-art-69474, how are you running your neo and gms instances? Are they running via quickstart or are they hosted in a cloud provider?
g
They are hosted in a cloud provider: aws ecs
g
When there are issues, are they resolved immediately on next refresh? or does it take some time to get back online?
g
yes, immediately
g
got it- sounds like the neo connection may just be timing out too quickly or not retrying properly.
let me file an issue for this
g
indeed. Thanks a lot
also, these are clusters without any active users. There are some data. So I assume it’s not a load issue
g
here's the issue:
could you add your stacktrace & any additional details about your setup that may help debugging?
g
sure, will do
g
thanks 👍
g
thank you 🙂
I’ve been trying to call the GMS directly on the ingest-endpoint, but I have a hard time creating the query
g
ah, you mean POSTing data directly?
g
nah, reproducing the behaviour without using the frontend. I assumed it’s GET
g
ah i see
i can give you a curl, one moment
g
lovely
g
Copy code
curl --location --request GET '<http://localhost:8080/lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29>' \
--header 'X-RestLi-Protocol-Version: 2.0.0'
something like this should work ^
you'll just want to replace the urn parameter with an urn you use
and make sure to url encode the urn
g
it worked. Thanks
I didn’t supply the
direction
so it threw a 500. Not very helpful error 🙂
g
good point- that param should be marked as required in the rest.li endpoint. I'll add another issue to throw a 400
👍 1
h
@gifted-art-69474 - Any documentation that you have followed to install datahub on ECS cluster, would be helpful.
👌 1
plus1 1