Assuming that most of us already had some level of...
# advice-data-governance
a
Assuming that most of us already had some level of data documentation before adopting DataHub, I’m curious how the rest of you are managing that documentation. For instance, all of our Materialized Views are documented in Confluence with a description of why they were created, how to make use of them, and descriptions for what each column contains. Would you: • Keep the existing documentation and simply LINK to it from the entity’s page in DataHub? • Migrate all that documentation into DataHub, considering some will be at the table level, some at the column level, some maybe just tangential?
g
If you have ways too programmatically access that content you could sync it with datahub, that's what we do with sources that have wider adoption
then eventually try to point users more and more towards documenting in datahub(easier said than done).
💯 1
b
Programmatically accessing confluence is a html parsing nightmare 🥲. Maybe point them to the confluence page?
g
Yikes, that's true, didn't realize you had to parse the HTML of a confluence page 😬
a
Interesting ideas. Thanks for that. I didn’t realize it was a ton of parsing to grab confluence docs programmatically. That’s unfortunate.
b
i am wondering if its easier to just appending some keywords from the confluence page to help in the searchability of the dataset inside Datahub. Then just point them to the Confluence page for more information. Ultimately the tech team cannot do the entire population of information by themselves, the users must also be convinced to do it in Datahub too.
a
Yeah, completely agree. But you know…those users will typically be convince when/if they see some critical mass in there. That’s why I wanted to do the bulk migration and then kill the Confluence docs 🙂