white-beach-27328
06/10/2021, 7:03 PMhive
ingestion recipe leveraging acryl-datahub[hive, datahub-kafka]==0.8.1.1
which created a DatasetSnapshot
through the MCE Consumer which I can retrieve from the GMS with a request to the /datasets?action=getSnapshot
endpoint using the urn
I see in the kafka message. However, when I look in the datahub frontend, I can’t find the dataset anywhere and it doesn’t come back when search for the dataSet’s name. Kind of confused as to what the problem would be, any ideas?steep-pizza-15641
06/10/2021, 7:14 PMcurved-magazine-23582
06/14/2021, 2:20 AM<https://github.com/linkedin/datahub/blob/0b75b4a96a801a91fe434c87f6c737d24d63eb14/metadata-ingestion/examples/mce_files/bootstrap_mce.json#L996>
Validation errors from GMS API:
[HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /snapshot/aspects/1/com.linkedin.datajob.DataJobInfo/type :: union type is not backed by a DataMap or null\n'\at com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:198)
curved-magazine-23582
06/14/2021, 2:23 AM{
"snapshot": {
"urn": "urn:li:dataJob:(urn:li:dataFlow:(glue,logistics-load,PROD),logistics-load)",
"aspects": [
{
"com.linkedin.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpuser:dataservices",
"type": "DATAOWNER"
}
],
"lastModified": {
"time": 1581407189000,
"actor": "urn:li:corpuser:dataservices"
}
}
},
{
"com.linkedin.datajob.DataJobInfo": {
"name": "logistics-load",
"description": "Tranform and load logistics data into Redshift",
"type": "SQL"
}
},
{
"com.linkedin.datajob.DataJobInputOutput": {
"inputDatasets": [
"urn:li:dataset:(urn:li:dataPlatform:s3,logistics_raw.shipment,PROD)"
],
"outputDatasets": [
"urn:li:dataset:(urn:li:dataPlatform:redshift,redshift_edw_production.edw_logistics_box,PROD)"
]
}
}
]
}
}
brief-lizard-77958
06/14/2021, 12:23 PMgifted-student-48095
06/15/2021, 9:28 AMhandsome-airplane-62628
06/15/2021, 2:41 PMwonderful-quill-11255
06/16/2021, 6:34 AMaverage-autumn-35845
06/16/2021, 12:41 PMfaint-hair-91313
06/16/2021, 1:51 PMstraight-noon-75819
06/16/2021, 4:37 PMstraight-noon-75819
06/16/2021, 6:09 PMnuke.sh
to clear all and run quickstart.sh
again. Does anyone see any weird behavior like this?millions-jelly-76272
06/17/2021, 8:32 AMgifted-bird-57147
06/17/2021, 11:32 AMcuddly-lunch-28022
06/17/2021, 12:39 PMminiature-airport-96424
06/17/2021, 1:31 PM2021-06-17 13:31:21.912:WARN:oejs.HttpChannel:qtp544724190-9: /health
datahub-datahub-gms-748884b4db-69cg2 datahub-gms 2021-06-17T13:31:21.918950727Z javax.servlet.ServletException: javax.servlet.UnavailableException: Servlet Not Initialized
glamorous-kite-95510
06/18/2021, 2:24 AMcuddly-lunch-28022
06/18/2021, 7:31 AMicy-holiday-55016
06/18/2021, 2:09 PMbetter-orange-49102
06/19/2021, 5:13 PMException while fetching data (/browse) : java.lang.RuntimeException: Failed to execute browse: entity type DATASET, path [prod, goonrtpe], filters: null, start: 0, count: 10
mae-consumer logs says:
datahub-mae-consumer | org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void com.linkedin.metadata.kafka.DataHubUsageEventsProcessor.consume(org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, java.lang.String>)' threw exception; nested exception is java.lang.ClassCastException: com.linkedin.metadata.key.CorpUserKey cannot be cast to com.linkedin.identity.CorpUserInfo; nested exception is java.lang.ClassCastException: com.linkedin.metadata.key.CorpUserKey cannot be cast to com.linkedin.identity.CorpUserInfo
i've compared the data stored in MySQL for the datasets created using programmatically generated ones and the pipeline created ones and dont see a difference.
I'm using a slightly older version of datahub v0.8.1glamorous-kite-95510
06/20/2021, 6:46 AMglamorous-kite-95510
06/21/2021, 2:05 AMcuddly-lunch-28022
06/21/2021, 11:30 AMadorable-hairdresser-61775
06/21/2021, 8:34 PMbrief-lizard-77958
06/22/2021, 7:03 AMbrief-lizard-77958
06/22/2021, 8:49 AMsteep-pizza-15641
06/22/2021, 12:42 PMchilly-holiday-80781
06/23/2021, 11:33 PMfancy-helmet-32669
06/24/2021, 7:36 PMTraceback (most recent call last):
........
File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 272, in <listcomp>
return [self._generic_from_json(x, writers_schema.items, readers_schema.items)
File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 248, in _generic_from_json
result = self._union_from_json(json_obj, writers_schema, readers_schema)
File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 304, in _union_from_json
raise schema.AvroException('Datum union type not in schema: %s', value_type)
avro.schema.AvroException: ('Datum union type not in schema: %s', 'com.linkedin.pegasus2avro.common.BrowsePaths')
clever-smartphone-69649
06/25/2021, 6:51 PM