Do you think it is a good option to open datahub t...
# advice-data-governance
b
Do you think it is a good option to open datahub to your partners, stakeholders or even to your customers ? Do you think it can bring benefits or that is something to better avoid doing ?
l
It depends on what you are hoping to share with your customers.
a.k.a are you sharing a data catalog or a product catalog 😉
i find that data catalogues are more useful for internal purposes while product catalogs are useful for external (but hide a lot of plumbing as it can be overwhelming)
well, that's my opinion anyway
1
a
The risks probably outweigh the benefits in many cases. For example, how sure are you that sql profiler statistics will never accidentally reveal private data to a customer? That someone won't add a new database table that gets automatically scanned and added to Datahub, and reveals secret info? There are also subtler drawbacks. When used internally, people will feel more relaxed about writing documentation, it's for you and your company, and probably colleagues will understand it's better to have some OK-ish docs than nothing. But if you know customers and partners are also looking, will people just prefer to avoid writing docs because they need to be done very well or not at all? This model doesn't IMO work for data documentation, because often the best knowledge depends on people who don't have and never will have the time to do "perfect, customer-facing" docs.
1
c
Glomming onto this, unbridled access probably a bad thing but process/API transformation, e.g., to deliver a data dictionary to an API client is the prescribed method?
1
b
So, letting them know a data dictionary only or the whole catalog ?
a
I’ve done it in the past, but we had a completely separate catalog implementation for External Customers where this “external” catalog could only read approved and curated datasets that were shared with clients. We did that precisely because of the risks that @ambitious-piano-33685 highlighted above. It worked out pretty well and clients LOVED that every dataset they bought came with a highly curated catalog. (Also, clients could only see their own specific datasets of course)
👍 1
1
b
Ideally I would like to have a separate field in the catalog that is called "External definition" which can override the full internal definition. Reason being, the external definition often needs to be differently crafted (fewer acronyms, less complexity). Perhaps this is also a matter of the acumen with which our internal definitions are written. For now, we typically export field-level definitions from the BI tool since they are very simple and generic. However, another major pain point is maintaining definitions in the warehouse, BI tool, and Data Catalog. That has to change. (note: I'm here because we are considering a POC for DataHub so the above is not DataHub specific)
1
b
So you mean with external definition something like a explanation to stakeholders ? Thanks @blue-magazine-73492
b
Yes @breezy-noon-83306. Similar to @acceptable-potato-35922, when we share data with vendors, 3rd party organizations, and even customers, we need to have that simple definition which skips the nuances that are interesting to internal/analyst audiences.
this 1
Dreaming along, I could also see the external definitions being pushed to a CMS to power a public knowledge base.
1
b
To a external web page? It would be a good option too
💯 1
b
Yes exactly. My idea is to maintain in as few different places as possible -- we are far from it today.
1
s
It will be nice to have a public UI for specific products. I work for a state entity. In our case we could use the same product (DataHub) both for internal use and for our OpenData
1