Hi I posted this in the governance channel but i think this DataHub #advice-metadata-modeling

Hi, I posted this in the governance channel but i ...

average-vr-23088

08/11/2022, 2:25 PM

Hi, I posted this in the governance channel but i think this channel is probably more appropriate. I was wondering what the recommended approach is for maintaining descriptions for fields on the schema of a dataset. For example, if we have Dataset A and Dataset B which share a number of common fields, we would have to manually enter the description for each field twice. Similarly, if we have Dataset X which gets consumed by some workflow/job and then stored in a table as Dataset Y, how could we avoid having to specify descriptions on the upstream and downstream, instead of in just the upstream for example. Is there a recommended way to do this? I suppose we could use a transform to maybe auto populate the description from a central source or something?

👍 1

modern-artist-55754

08/15/2022, 11:56 PM

In theory, what you said can only be achieved if Column Level Linage worked properly. AFAIK, only snowflake will have proper column level linage. So i doubt description propagation will work at the current stage.

average-vr-23088

08/16/2022, 1:54 PM

Ah i see. Thanks for the insight.

cuddly-butcher-39945

08/18/2022, 1:15 PM

So, it sounds like DataHub is able to ingest column level lineage and propagate down?

modern-artist-55754

08/19/2022, 1:29 AM

@cuddly-butcher-39945 there will be support for column level lineage for Snowflake. The other sources depends on SQL parsing, i think. I don’t think there’s code for description propagation, but i don’t think it’s hard (if column level linage works).

eager-australia-69729

08/23/2022, 11:10 PM

in the absence of inheritance through column level lineage, do you find it easier or too much overhead to capture the description in a glossary term and link the glossary term to all columns you need?

modern-artist-55754

08/23/2022, 11:57 PM

@eager-australia-69729 depending on the level of metadata you have. If you start enforcing metadata to be embedded in every single model created through the automation process, you wouldnt have much problem. There are some ways you can programmatically attach metadata i.e. using transformers, csv enrichment etc but you probably need to have some conventions in your models

Open in Slack

Previous Next