Hi, I posted this in the governance channel but i ...
# advice-metadata-modeling
a
Hi, I posted this in the governance channel but i think this channel is probably more appropriate. I was wondering what the recommended approach is for maintaining descriptions for fields on the schema of a dataset. For example, if we have Dataset A and Dataset B which share a number of common fields, we would have to manually enter the description for each field twice. Similarly, if we have Dataset X which gets consumed by some workflow/job and then stored in a table as Dataset Y, how could we avoid having to specify descriptions on the upstream and downstream, instead of in just the upstream for example. Is there a recommended way to do this? I suppose we could use a transform to maybe auto populate the description from a central source or something?
👍 1
m
In theory, what you said can only be achieved if Column Level Linage worked properly. AFAIK, only snowflake will have proper column level linage. So i doubt description propagation will work at the current stage.
a
Ah i see. Thanks for the insight.
c
So, it sounds like DataHub is able to ingest column level lineage and propagate down?
m
@cuddly-butcher-39945 there will be support for column level lineage for Snowflake. The other sources depends on SQL parsing, i think. I don’t think there’s code for description propagation, but i don’t think it’s hard (if column level linage works).
e
in the absence of inheritance through column level lineage, do you find it easier or too much overhead to capture the description in a glossary term and link the glossary term to all columns you need?
m
@eager-australia-69729 depending on the level of metadata you have. If you start enforcing metadata to be embedded in every single model created through the automation process, you wouldnt have much problem. There are some ways you can programmatically attach metadata i.e. using transformers, csv enrichment etc but you probably need to have some conventions in your models