Hi not sure if this is the right channel for this question b DataHub #advice-data-governance

Hi, not sure if this is the right channel for this...

average-vr-23088

08/10/2022, 7:04 PM

Hi, not sure if this is the right channel for this question but i was wondering what the recommended approach is for maintaining descriptions for fields on the schema of a dataset. For example, if we have Dataset A and Dataset B which share a number of common fields, we would have to manually enter the description for each field twice. Similarly, if we have Dataset X which gets consumed by some workflow/job and then stored in a table as Dataset Y, how could we avoid having to specify descriptions on the upstream and downstream, instead of in just the upstream for example. Is there a recommended way to do this? I suppose we could use a transform to maybe auto populate the description from a central source or something?

plus1 2

better-orange-49102

08/11/2022, 4:24 AM

Glossary terms for those common fields?

average-vr-23088

08/11/2022, 2:05 PM

That is kind of what we are currently resorting to. Creating a glossary term for each field and adding descriptions to that. It still has the issue of having to manually re-tag the newly ingested DataSets that have those fields. Is that something we could use tranforms for?

average-vr-23088

08/11/2022, 2:10 PM

This approach also kinda forces you to keep your schema (minus type info) as glossary terms. Even for simple use cases where we have some source DataSet which gets processed by a job and pushed into a datawarehouse table, it is very common for some fields to travel as is, downstream.

Open in Slack

Previous Next