Hello, question for <#C02QMLWJG12|advice-data-gove...
# advice-metadata-modeling
q
Hello, question for #advice-data-governance, how people with datahub manage
Enum
list of values ? The list of supported values (and sometimes the explanation) is very valuable and I feel I miss something. Are people documenting values in the description ? Not documenting at all ? In any different other way ?
c
We have to approaches to manage them: Either provide it in the description of the corresponding schema field or use a glossary term if the content is to extensive for the description. When we used glossary items, we directly connected them to the schema field by using a transformer
q
I thought about glossary but wasn't sure. I'm interested in the transformer approach (I'm still new to datahub, not fully sure how things relate). Do you have a screenshot or something similar ? Could extra properties take care of this use case ?
c
The glossary term simply gives you the opportunity to directly link the explanation to a schema field like you can see below. The user can then click on the link and gets forwarded to a wiki page explaining the enum and all available values. I think this especially suited when you have a lot of dimensions. The transformer allows you to automatically connect the glossary term with the schema field during the ingestion: https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#simple-add-dataset-glossaryterms https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#pattern-add-dataset-glossaryterms Otherwise, this would require manual effort Hm, properties are associated with the whole data asset... So there is no possibility to link it to the specific schema field (column etc.) I guess this is not what you want to achieve, right?
q
thanks ! I thought properties could be added at column level, indeed, not fitting my need. Glossary is probably a good idea, • you would recommend using one single Glossary term : salesforce/mypicklist => redirection to a wiki page ? or • salesforce/mypicklist/enum1 (and maybe a wiki page), salesforce/mypicklist/enum2, etc...
Made a few tests, could basically work with each individual values (see picture) with some IT magic for the ingestion (most probably I could poke my python colleagues) But i'm a bit scared, my first picklist ... contains ... 315 values 😄 it might kill the UI & UX.
c
Ah I was more talking about having one glossary term per enum and then documenting the individual values in the markdown of the glossary term
q
Yes that's what I understood from your suggestion. My concern is that our primary use case today is "wild data structure changes done by our 4 different tech contractors companies" (we have little control, often close to zero communication) and I would like to find a way to upfront detect changes. Thanks to https://datahubproject.io/docs/dev-guides/timeline/ and some excel work (https://datahubspace.slack.com/archives/C02QMLWJG12/p1698767908289349?thread_ts=1698240039.618599&amp;cid=C02QMLWJG12) , we are able to detect structure changes, but not Enum. I'm actually thinking that we could just use the column description field to store the "dump" of values, if different, it should be seen in timeline API.
c
Oh okay
I'm actually thinking that we could just use the column description field to store the "dump" of values, if different, it should be seen in timeline API.
Yes this might fit your need