Hello all! Glad to be with you all. I am a data sc...
# getting-started
f
Hello all! Glad to be with you all. I am a data scientist working at the University of Kansas Center for Public Partnerships & Research (KU-CPPR). We are looking for a centralized metadata management tool for our organization and are interested in DataHub. We are reviewing your documentation, but we also have a list of questions to which we would love some clarification. Our key questions: 1. Can DataHub be used as a metadata tool WITHOUT storing the data? Also, if we decide to later add data to an environment with DataHub, how easy would that be? 2. Can DataHub auto-generate metadata or have a permanent link to data sources (for automated updating) WITHOUT jeopardizing data security/privacy? 3. How easy is it to edit the metadata after creation so that we have a living tool? 4. Is there a version control mechanism to not overwrite old comments/edits when uploading new versions of data dictionaries? 5. Can the DataHub tool search, filter, tag, add notes, track data providence? 6. What are other key features of DataHub you would say make it stand out from other metadata products? 7. What else should we be thinking about when thinking about metadata management / data governance? If you have a moment to respond to any of the above questions, our team would love to hear your thoughts! Thank you all very much
m
Hi @fast-winter-10784, I will try to answer some of your questions and will request @mammoth-bear-12532 to chime in as well. 1.&2. Datahub does ingest data from underlying sources and store in its backend stores. The ingestion typically happens at a certain cadence so as to maintain freshness. A "recipe file" is what links datahub to the corresponding data source. Checkout - https://datahubproject.io/docs/metadata-ingestion 3. While we do ingest data from sources, we also support adding/editing information from the UI. Classic examples are - updating documentation, adding new tags, adding owners, links to other documents. Does this answer your question w.r.t "editing"? 4. I'll get back to you on this but we do maintain versions for information in our backend. But as of now displaying the latest copy. 5. We do support tags at both field level and dataset level and support advanced search over the entities. Tracking data lineage is also supported. Here's an example. For notes, would love to hear your use case as to where and how you'd like to use them. But right now we do support "Adding documentation" at a dataset level.
👍 1
m
@fast-winter-10784: have you played around with the hosted demo version of datahub at demo.datahubproject.io ? That might answer quite a few of your questions around : how to use the product.
👀 1
k
@fast-winter-10784 If by storing data you mean actual data from the table instead of the metadata about the table like table name, column name etc.. then the answer is yes.. It does ingest and store metadata. Can can store some data based on whether you have enabled data profiling or not.. If you do not configure data profiling, it will only ingest metadata and no data.. you can enable data profiling at any point in future.. hope this helps..
👍 1
f
Thanks @miniature-tiger-96062 & @kind-dawn-17532! I'll digest that for a second and may have some follow up. @mammoth-bear-12532, I did see it briefly, and yes, that's a great idea.
👍 1