Hi team, A very naive question but how can everyt...
# getting-started
p
Hi team, A very naive question but how can everything done via Datahub UI be persisted over Git? Is it possible to back up everything to Git via yaml?
d
Hi Dhruv, everything done via UI should be stored in our DB. If you're looking for a backup plan, this might be helpful https://datahubproject.io/docs/how/backup-datahub/
p
Yes, I am aware of this. I want to know which all specs can be backed up via a yaml structure. For example, I know Data Product is one.
d
Hmm I'm not entirely sure .. @gray-shoe-75895 could you help me on this?
a
For now, the business glossary supports yaml- there’s ingestion docs here https://datahubproject.io/docs/generated/ingestion/sources/business-glossary
What other entities are you interested in and how do you want to manage their lifecycle?
p
Thanks, I would like to expand this to Tags at least. It would nice to have the dataset-tags and dataset-glossaryterms relationship persisted over Git
g
Would love to learn more about what you’re imagining here - would you want git to be the source of truth for this information? What should happen if someone adds a tag/term to a dataset via the UI?
p
One should be able to get the diff between yaml and the state on UI. They should then be able to apply that diff on yaml so that they can take it to git I think that is how datahub entities are designed
g
Makes sense. We’re definitely thinking in this direction (and support this for business glossary and data products already), but haven’t implemented it for dataset tags yet
m
@gray-shoe-75895 do you support this for file based data lineage, which is done by yaml file as well but can be edited through UI? Also, fully support the idea stated above. For us, it would be nice to have dataset documentation stored and be editable via yaml, while synced with changes from UI, to allow both developers and business users to contribute to documentation
g
File based lineage is a one-way sync right now, from the file -> datahub. It’s possible to use the SDKs to fetch lineage updates from DataHub and write them back to the file too, but we don’t have anything pre-packaged for that yet