magnificent-camera-71872
08/19/2021, 5:44 AMmagnificent-camera-71872
08/19/2021, 5:45 AMmagnificent-camera-71872
08/19/2021, 5:45 AM05:42:06 [Thread-14773] ERROR c.l.datahub.graphql.GmsGraphQLEngine - Failed to load Entities of type: Dataset, keys: [urn:li:dataset:(urn:li:dataPlatform:redshift,datalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms,DEV)] Failed to batch load Datasets
05:42:06 [Thread-14773] WARN n.g.e.SimpleDataFetcherExceptionHandler - Exception while fetching data (/browse/entities) : java.lang.RuntimeException: Failed to retrieve entities of type Dataset
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$102(GmsGraphQLEngine.java:719)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 1 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load Datasets
at com.linkedin.datahub.graphql.types.dataset.DatasetType.batchLoad(DatasetType.java:102)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$102(GmsGraphQLEngine.java:716)
... 2 common frames omitted
Caused by: java.lang.NullPointerException: null
05:42:06 [application-akka.actor.default-dispatcher-97029] ERROR react.controllers.GraphQLController - Errors while executing graphQL query: "query getBrowseResults($input: BrowseInput!) {\n browse(input: $input) {\n entities {\n urn\n type\n ... on Dataset {\n name\n origin\n description\n platform {\n name\n info {\n logoUrl\n __typename\n }\n __typename\n }\n tags\n ownership {\n ...ownershipFields\n __typename\n }\n globalTags {\n ...globalTagsFields\n __typename\n }\n __typename\n }\n ... on Dashboard {\n urn\n type\n tool\n dashboardId\n info {\n name\n description\n externalUrl\n access\n lastModified {\n time\n __typename\n }\n __typename\n }\n ownership {\n ...ownershipFields\n __typename\n }\n globalTags {\n ...globalTagsFields\n __typename\n }\n __typename\n }\n ... on GlossaryTerm {\n name\n ownership {\n ...ownershipFields\n __typename\n }\n glossaryTermInfo {\n definition\n termSource\n sourceRef\n sourceUrl\n customProperties {\n key\n value\n __typename\n }\n __typename\n }\n __typename\n }\n ... on Chart {\n urn\n type\n tool\n chartId\n info {\n name\n description\n externalUrl\n type\n access\n lastModified {\n time\n __typename\n }\n __typename\n }\n ownership {\n ...ownershipFields\n __typename\n }\n globalTags {\n ...globalTagsFields\n __typename\n }\n __typename\n }\n ... on DataFlow {\n urn\n type\n orchestrator\n flowId\n cluster\n info {\n name\n description\n project\n __typename\n }\n ownership {\n ...ownershipFields\n __typename\n }\n globalTags {\n ...globalTagsFields\n __typename\n }\n __typename\n }\n ... on MLFeatureTable {\n urn\n type\n name\n description\n featureTableProperties {\n description\n mlFeatures {\n urn\n __typename\n }\n mlPrimaryKeys {\n urn\n __typename\n }\n __typename\n }\n ownership {\n ...ownershipFields\n __typename\n }\n platform {\n name\n info {\n logoUrl\n __typename\n }\n __typename\n }\n __typename\n }\n ... on MLModel {\n name\n origin\n description\n tags\n ownership {\n ...ownershipFields\n __typename\n }\n globalTags {\n ...globalTagsFields\n __typename\n }\n platform {\n name\n info {\n logoUrl\n __typename\n }\n __typename\n }\n __typename\n }\n ... on MLModelGroup {\n name\n origin\n description\n ownership {\n ...ownershipFields\n __typename\n }\n platform {\n name\n info {\n logoUrl\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n groups {\n name\n count\n __typename\n }\n start\n count\n total\n metadata {\n path\n totalNumEntities\n __typename\n }\n __typename\n }\n}\n\nfragment ownershipFields on Ownership {\n owners {\n owner {\n ... on CorpUser {\n urn\n type\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n __typename\n }\n __typename\n }\n ... on CorpGroup {\n urn\n type\n name\n info {\n email\n admins {\n urn\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n teams\n skills\n __typename\n }\n __typename\n }\n members {\n urn\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n teams\n skills\n __typename\n }\n __typename\n }\n groups\n __typename\n }\n __typename\n }\n __typename\n }\n type\n __typename\n }\n lastModified {\n time\n __typename\n }\n __typename\n}\n\nfragment globalTagsFields on GlobalTags {\n tags {\n tag {\n urn\n name\n description\n __typename\n }\n __typename\n }\n __typename\n}\n", result: {errors=[{message=Exception while fetching data (/browse/entities) : java.lang.RuntimeException: Failed to retrieve entities of type Dataset, locations=[{line=3, column=5}], path=[browse, entities], extensions={classification=DataFetchingException}}], data={browse=null}}, errors: [ExceptionWhileDataFetching{path=[browse, entities], exception=java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset, locations=[SourceLocation{line=3, column=5}]}]
magnificent-camera-71872
08/19/2021, 5:46 AM05:42:01.699 [qtp544724190-76] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:01.703 [qtp544724190-13] INFO c.l.m.r.entity.EntityResource - GET BROWSE RESULTS for dataset at path /dev/redshift
05:42:01.703 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 4ms
05:42:01.706 [qtp544724190-202] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:01.708 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - POST /entities?action=browse - browse - 200 - 5ms
05:42:01.710 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 4ms
05:42:01.712 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
05:42:01.719 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
05:42:04.269 [qtp544724190-212] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:04.271 [qtp544724190-204] INFO c.l.m.r.entity.EntityResource - GET BROWSE RESULTS for dataset at path /dev/redshift/datalake
05:42:04.273 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 4ms
05:42:04.274 [qtp544724190-77] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:04.275 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - POST /entities?action=browse - browse - 200 - 4ms
05:42:04.278 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 4ms
05:42:04.281 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
05:42:04.288 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
05:42:06.750 [qtp544724190-13] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:06.752 [qtp544724190-12] INFO c.l.m.r.entity.EntityResource - GET BROWSE RESULTS for dataset at path /dev/redshift/datalake/dms_fxcu_arbor_fxcu_arbor
05:42:06.754 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 4ms
05:42:06.756 [qtp544724190-67] INFO c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
05:42:06.758 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - POST /entities?action=browse - browse - 200 - 6ms
05:42:06.759 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 3ms
05:42:06.759 [qtp544724190-10] INFO c.l.m.r.entity.EntityResource - BATCH GET [urn:li:dataset:(urn:li:dataPlatform:redshift,datalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms,DEV)]
05:42:06.762 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
05:42:06.765 [pool-11-thread-1] INFO c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aredshift%2Cdatalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms%2CDEV%29) - batchGet - 200 - 6ms
05:42:06.769 [I/O dispatcher 1] INFO c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
magnificent-camera-71872
08/19/2021, 5:50 AMcurl --location --request GET 'http://<MY_SERVER>:8080/entities/urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Aredshift%2Cdatalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms%2CDEV)'
'DEV' is the table I added using airflow (and doesn't work on the UI), and PROD is the table I added using the CLI (and works on the UI)....magnificent-camera-71872
08/19/2021, 5:50 AMbig-carpet-38439
08/19/2021, 5:53 AMbig-carpet-38439
08/19/2021, 5:56 AMbig-carpet-38439
08/19/2021, 6:00 AMmammoth-bear-12532
magnificent-camera-71872
08/19/2021, 6:11 AMacryl-datahub==0.8.10.0
On the cli: acryl-datahub==0.8.8.0
The UI was installed at a similar time as the CLI machine and is most likely a similar level.
Is there anyway I can check ?magnificent-camera-71872
08/19/2021, 6:17 AMdiff cmf_dms_cli_formatted.json cmf_dms_airflow_formatted.json
4c4
< "urn": "urn:li:dataset:(urn:li:dataPlatform:redshift,datalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms,PROD)",
---
> "urn": "urn:li:dataset:(urn:li:dataPlatform:redshift,datalake.dms_fxcu_arbor_fxcu_arbor.cmf_dms,DEV)",
8c8
< "origin": "PROD",
---
> "origin": "DEV",
13a14,20
> "com.linkedin.common.BrowsePaths": {
> "paths": [
> "/dev/redshift/datalake/dms_fxcu_arbor_fxcu_arbor/cmf_dms"
> ]
> }
> },
> {
31a39
> "isPartOfKey": false,
42a51
> "isPartOfKey": false,
53a63
> "isPartOfKey": false,
64a75
> "isPartOfKey": false,
75a87
> "isPartOfKey": false,
.....
.....
> "isPartOfKey": false,
1252a1371
> "isPartOfKey": false,
1266,1272d1384
< }
< },
< {
< "com.linkedin.common.BrowsePaths": {
< "paths": [
< "/prod/redshift/datalake/dms_fxcu_arbor_fxcu_arbor/cmf_dms"
< ]
`isPartOfKey`seems like it could be the culprit. Its in the json for the dataset added from airflow, but not from the cli.
Was there a change in this area between acryl-datahub==0.8.8.0
& acryl-datahub==0.8.10.0
??big-carpet-38439
08/19/2021, 6:22 AMmagnificent-camera-71872
08/19/2021, 6:23 AMbig-carpet-38439
08/19/2021, 6:23 AMbig-carpet-38439
08/19/2021, 6:23 AMbig-carpet-38439
08/19/2021, 6:23 AMmagnificent-camera-71872
08/19/2021, 6:23 AMacryl-datahub==0.8.8.0
on the airflow server and re-ran the dag...
and hey-ho, i can now see the metadata on the ui 🙂magnificent-camera-71872
08/19/2021, 6:24 AMmagnificent-camera-71872
08/19/2021, 6:24 AMbig-carpet-38439
08/19/2021, 6:24 AMbig-carpet-38439
08/19/2021, 6:25 AMmagnificent-camera-71872
08/19/2021, 6:28 AM