Follow-up question, what functionality does DataHu...
# getting-started
i
Follow-up question, what functionality does DataHub guarantee if I don't use its pre-packaged DataModel? I assume things like data lineage would (in the UI) would break right?
The pre-packaged data model is pretty good but for my company's particular data infrastructure I may need something different.
m
If you are essentially making backwards incompatible changes to the model, then likelihood of breaking is high 🙂
g
the data models are designed to be extensible however, it might be easier for you to add aspects to the dataset entity rather than create a new entity from scratch
i
I don't mean changing between completely different models while datahub is already up and running. I mean creating my own version of dataset, user and so on.
An example would be to have the notion of table-based datasets (i.e: SQL tables), audio datasets, video datasets, free-text datasets, among others.
g
gotcha. it would be interesting to learn what about the existing dataset and user entities dont work for you
b
We are toying with the idea of custom, dynamic model extensions. We should discuss further to see if this would be a good use case
i
@big-carpet-38439 that sounds interesting. Is there a discussion issue, RFP, google doc to better understand what features you are thinking of?
b
Not yet -- This is a pretty recent development. Maybe we can schedule a time to discuss
i
@green-football-43791 the more likely scenario for my needs (as I see it at the moment, likely to evolve) will be specialization of existing entities (the dataset example I gave) and the need to add more entities which are specific to my domain.
Happy to discuss 🙂
b
So 2 things we've actually been considering: dynamic registration of new entities and dynamic extension of existing models
g
yeah, exactly ^
which sounds like your two use cases
have you figured out what new entities you need? or what new dataset extensions you might want?
i
Not all nor thoroughly, I wanted first to understand whether it was possible.
I'm attempting to use DataHub as a PoC for a small part of my company's internal data assets. Ironically it does not map to datasets. It is an event-based pipeline that populates SQL databases.
g
so am I correct in understanding you want to track the pipeline's components in dh?
i
Yes