Has the format changed with the latest datahub rel...
# ingestion
c
Has the format changed with the latest datahub release from nested / structured data fields? I had to upgrade to the latest version to get lineage ingested by airflow running. Now my json file produces a weird schema / table entry in datahub. I still had to prefix the upper/outer names of arrays and struct to get the order right. Is this now differenty implemented? Maybe even supporting ingesting from hive directly with deeply nested tables. Thanks for the feedback.
g
Yes, we added support for visualizing nested avro schemas in the latest release!
b
c
But I could visualize it before, but I had to write a custom script that transformed my data... See edited original post.
I'll look into the PR.
b
If you need support on altering any hand written schema production logic, @green-football-43791 can help
c
Ok it seems the new schema is less error prone and obvious, because before all fields were on the same level in the avro... no matter on which nested level they are located in the table.
thanks @big-carpet-38439 I guess I can fix it myself, but get back to you. Thanks for the quick reply and hint.
b
Yes - we intended to capture schema topology with higher fidelity (less lossiness) in the new format
Please do let us know if you need support here. @ripe-dress-87297 Was there a document outlining the new specification?
g
Yep- the new field path specification is outlined here:
c
Ok, thanks. What is a recursive record. And what is the difference between arrays and records to display nested data? Is there an example that illustrates the different cases?
The boostrap_mce.json does not yet illustrate the new nested feature: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/mce_files/bootstrap_mce.json
Does it?
Ok, I've misunderstood the changes coming with v2. Forget my previous posts, I have to investigate the problems first.
There is no backwards compatibility?
b
I believe there is backwards compatibility. @ripe-dress-87297 Who led the design on this format can speak to it better!
h
Hi @colossal-furniture-76714, the format is backward compatible in the sense that it is possible to obtain the old field path by simply stripping away all the new v2 tokens (anything enclosed in square brackets). We use this technique to retrieve other data (such as tags) etc. We do the actual migration of fieldPath key from v1 to v2 during an update operation to some data using the fieldPath as key.
c
Ok I get it. For some reasons however my data wrangle script does not work anymore and everything is pretty messed up. Do you have an example mce file that only uses v2? I think that would be very helpful as this is how i've come up with my old transformation pipeline.
Copy code
"fields": [    {    "fieldPath": "shipment_info",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.RecordType": {}    }    },    "nativeDataType": "varchar(100)",    "recursive": false    },    {    "fieldPath": "shipment_info.date",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info date description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.DateType": {}    }    },    "nativeDataType": "Date",    "recursive": false    },    {    "fieldPath": "shipment_info.target",    "jsonPath": null,    "nullable": false,    "description": {    "string": "Shipment info target description"    },    "type": {    "type": {    "com.linkedin.pegasus2avro.schema.StringType": {}    }    },    "nativeDataType": "text",    "recursive": false    },
How would this look like in the nw format?