Hi, I would like to ask for an advice. I'm trying ...
# ingestion
s
Hi, I would like to ask for an advice. I'm trying to ingest data from Snowflake (that works fine), but for a reason beyond my control, we don't have descriptions of tables/columns directly in snowflake, but in another app. I can easily export those into json or csv, but how can I merge those to the ingestion process? Is it possible to use transformers for that? Any kind of help/example would be really great.
b
you dont really need to ingest the description during ingestion, you can do it afterwards as well
can refer to examples in this folder, though nothing exactly meets your use case https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.py -> create a schema from scratch https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_add_owner.py -> query for existing owners, then adding an owner. your use case would be something along the lines of querying for the existing schema, then adding descriptions to the fields and emitting it back
s
thanks! I'll have a look
b
you also could do it during ingestion(a custom transformer), but just wondering if it will be easier to do the matching of the dataset to the descriptions after the ingestion as opposed to during ingestion
s
yeah, i was aiming at doing it during the ingestion, as the connectivity between our Snowflake and datahub is not direct (for security reasons). Instead I use file sink when extracting the data, then transfer the file and then use it as a source to load to datahub. So enriching the file during/after creation before it's moved was my idea..
b
You can certainly use a transformer for this purpose
Basically for each dataset, simply try to lookup the corresponding description
And enrich the MCP (datasetProperties aspect) to have your mapped description
s
@big-carpet-38439 would you have some example please? That would help very much to understand how to actually implement this.
b