high-hospital-85984
06/06/2022, 10:58 AMhigh-hospital-85984
06/06/2022, 10:59 AMhigh-hospital-85984
06/06/2022, 11:02 AMDataFetchingException
. Any idea what could be the cause?high-hospital-85984
06/06/2022, 11:09 AM10:30:10.120 [Thread-1180287] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$153(GmsGraphQLEngine.java:1376)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Data Jobs
at com.linkedin.datahub.graphql.types.datajob.DataJobType.batchLoad(DataJobType.java:118)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$153(GmsGraphQLEngine.java:1373)
... 2 common frames omitted
Caused by: com.datahub.util.exception.ModelConversionException: Failed to deserialize DataMap: {"inputDatajobs":[],"inputDatasets":["urn:li:dataset:(urn:li:dataPlatform:snowflake,MYTABLE,PROD)",
which continues as a long list of elements and ends as ,"urn:li:dataset
. I’m not sure if this an artefact of the error logging or if the actual payload is cut short.high-hospital-85984
06/06/2022, 3:16 PMdata
field and a errors
field. The data field contains stuff that looks correct (as far as I can tell). The error field contains a long list of errors like this
{
"message": "An unknown error occurred.",
"locations": [
{
"line": 185,
"column": 3
}
],
"path": [
"dataset",
"schemaMetadata"
],
"extensions": {
"code": 500,
"type": "SERVER_ERROR",
"classification": "DataFetchingException"
}
},
{
"message": "An unknown error occurred.",
"locations": [
{
"line": 189,
"column": 3
}
],
"path": [
"dataset",
"previousSchemaMetadata"
],
"extensions": {
"code": 500,
"type": "SERVER_ERROR",
"classification": "DataFetchingException"
}
},
{
"message": "An unknown error occurred.",
"locations": [
{
"line": 432,
"column": 5
}
],
"path": [
"dataset",
"incoming",
"relationships",
0,
"entity"
],
"extensions": {
"code": 500,
"type": "SERVER_ERROR",
"classification": "DataFetchingException"
}
},
high-hospital-85984
06/06/2022, 3:17 PMhigh-hospital-85984
06/06/2022, 4:15 PM"path": [
"dataset",
"incoming",
"relationships",
0,
"entity"
],
And checking the data
field at that path, I see:
"incoming": {
"start": 0,
"count": 100,
"total": 288,
"relationships": [
{
"type": "Produces",
"direction": "INCOMING",
"entity": null,
"__typename": "EntityRelationship"
},
...
high-hospital-85984
06/06/2022, 4:15 PM"entity": null,
part is of course quite interesting 🤔high-hospital-85984
06/06/2022, 4:18 PMhigh-hospital-85984
06/06/2022, 4:39 PMhigh-hospital-85984
06/06/2022, 4:44 PM"path": [
"dataset",
"incoming",
"relationships",
9,
"entity",
"inputOutput",
"outputDatasets",
0,
"schemaMetadata"
],
high-hospital-85984
06/06/2022, 4:47 PMhigh-hospital-85984
06/06/2022, 5:15 PMbig-carpet-38439
06/06/2022, 6:05 PMbig-carpet-38439
06/06/2022, 6:05 PMbig-carpet-38439
06/06/2022, 6:06 PMhigh-hospital-85984
06/06/2022, 6:10 PMhigh-hospital-85984
06/06/2022, 6:16 PMincoming: relationships(
input: {types: ["DownstreamOf", "Consumes", "Produces"], direction: INCOMING, start: 0, count: 100}
I noticed that if I set the count to ~80 it went through, but not with 100. I then changed the start to 80 and count 30, and it failed. Setting the start to 100 and count to 30 worked fine. So this seems to support your theory that it's not a size problem, but rather that there might be a bad entry between 80 and 100 in the list.big-carpet-38439
06/06/2022, 6:17 PMbig-carpet-38439
06/06/2022, 6:17 PMbig-carpet-38439
06/06/2022, 6:17 PMhigh-hospital-85984
06/06/2022, 6:20 PMhigh-hospital-85984
06/06/2022, 6:22 PMbig-carpet-38439
06/06/2022, 6:22 PMhigh-hospital-85984
06/06/2022, 6:23 PMhigh-hospital-85984
06/06/2022, 6:25 PMschemaMetadata(version: 0) {
name
__typename
}
also throws an errorbig-carpet-38439
06/06/2022, 6:26 PMbig-carpet-38439
06/06/2022, 6:26 PMhigh-hospital-85984
06/06/2022, 6:28 PMhigh-hospital-85984
06/07/2022, 6:31 PMmetadata
field) is not valid JSON. It ends with something like ..., "type":{"type":{"com.linkedin.schema.NumberType":{}}},"
. Like I said, it's a wide table, but I checked and the content is around 16380 characters long, which should fit into the TEXT
field (postgres)high-hospital-85984
06/07/2022, 6:34 PMhigh-hospital-85984
06/07/2022, 6:58 PMoutputDatasets
being very long, and the stored string actually has the exact same length as the problematic schemaMetadata above, 16384 characters long.high-hospital-85984
06/07/2022, 7:05 PMmetadata
field onwards. (How does the MAE not capture this, btw?)
• The datajobInputOutput
aspect does not clean up duplicates in the input (this can be debated if it's a bug)
• Graphql does not handle erroneous data/nulls gracefully.high-hospital-85984
06/07/2022, 7:22 PMhigh-hospital-85984
06/08/2022, 9:47 AMbig-carpet-38439
06/08/2022, 6:35 PMbig-carpet-38439
06/08/2022, 6:35 PMbig-carpet-38439
06/08/2022, 6:35 PMbig-carpet-38439
06/08/2022, 6:36 PMdatajobInputOutput
? If you are using "ingest" then the contract is basically what you put in you get out. GMS layer does not have a strong opinion on the semantic meaning of the fields and thus cannot do the de-dupe