Has the format changed with the latest datahub release from DataHub #ingestion

Has the format changed with the latest datahub rel...

colossal-furniture-76714

08/18/2021, 3:58 PM

Has the format changed with the latest datahub release from nested / structured data fields? I had to upgrade to the latest version to get lineage ingested by airflow running. Now my json file produces a weird schema / table entry in datahub. I still had to prefix the upper/outer names of arrays and struct to get the order right. Is this now differenty implemented? Maybe even supporting ingesting from hive directly with deeply nested tables. Thanks for the feedback.

green-football-43791

08/18/2021, 3:59 PM

Yes, we added support for visualizing nested avro schemas in the latest release!

big-carpet-38439

08/18/2021, 3:59 PM

Yes it has: https://github.com/linkedin/datahub/releases/tag/v0.8.9

👍 2

colossal-furniture-76714

08/18/2021, 4:02 PM

But I could visualize it before, but I had to write a custom script that transformed my data... See edited original post.

colossal-furniture-76714

08/18/2021, 4:03 PM

I'll look into the PR.

big-carpet-38439

08/18/2021, 4:04 PM

If you need support on altering any hand written schema production logic, @green-football-43791 can help

colossal-furniture-76714

08/18/2021, 4:05 PM

Ok it seems the new schema is less error prone and obvious, because before all fields were on the same level in the avro... no matter on which nested level they are located in the table.

colossal-furniture-76714

08/18/2021, 4:06 PM

thanks @big-carpet-38439 I guess I can fix it myself, but get back to you. Thanks for the quick reply and hint.

big-carpet-38439

08/18/2021, 4:07 PM

Yes - we intended to capture schema topology with higher fidelity (less lossiness) in the new format

big-carpet-38439

08/18/2021, 4:07 PM

Please do let us know if you need support here. @ripe-dress-87297 Was there a document outlining the new specification?

green-football-43791

08/18/2021, 4:08 PM

Yep- the new field path specification is outlined here:

green-football-43791

08/18/2021, 4:08 PM

https://github.com/linkedin/datahub/blob/master/docs/advanced/field-path-spec-v2.md

colossal-furniture-76714

08/23/2021, 10:05 AM

Ok, thanks. What is a recursive record. And what is the difference between arrays and records to display nested data? Is there an example that illustrates the different cases?

colossal-furniture-76714

08/23/2021, 2:16 PM

The boostrap_mce.json does not yet illustrate the new nested feature: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/mce_files/bootstrap_mce.json

colossal-furniture-76714

08/23/2021, 2:16 PM

Does it?

colossal-furniture-76714

08/24/2021, 2:36 PM

Ok, I've misunderstood the changes coming with v2. Forget my previous posts, I have to investigate the problems first.

colossal-furniture-76714

08/24/2021, 2:37 PM

There is no backwards compatibility?

big-carpet-38439

08/24/2021, 4:36 PM

I believe there is backwards compatibility. @ripe-dress-87297 Who led the design on this format can speak to it better!

helpful-optician-78938

08/24/2021, 5:19 PM

Hi @colossal-furniture-76714, the format is backward compatible in the sense that it is possible to obtain the old field path by simply stripping away all the new v2 tokens (anything enclosed in square brackets). We use this technique to retrieve other data (such as tags) etc. We do the actual migration of fieldPath key from v1 to v2 during an update operation to some data using the fieldPath as key.

colossal-furniture-76714

08/25/2021, 10:06 AM

Ok I get it. For some reasons however my data wrangle script does not work anymore and everything is pretty messed up. Do you have an example mce file that only uses v2? I think that would be very helpful as this is how i've come up with my old transformation pipeline.

colossal-furniture-76714

08/25/2021, 10:09 AM

Copy code

"fields": [    {    "fieldPath": "shipment_info",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.RecordType": {}    }    },    "nativeDataType": "varchar(100)",    "recursive": false    },    {    "fieldPath": "shipment_info.date",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info date description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.DateType": {}    }    },    "nativeDataType": "Date",    "recursive": false    },    {    "fieldPath": "shipment_info.target",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info target description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.StringType": {}    }    },    "nativeDataType": "text",    "recursive": false    },

colossal-furniture-76714

08/25/2021, 10:10 AM

How would this look like in the nw format?

Open in Slack

Previous Next