Hi folks! We are currently evaluating `Datahub` a...
# getting-started
b
Hi folks! We are currently evaluating
Datahub
as a more modular and lightweight alternative to
Apache Atlas
. Now my question is about the best practices to follow when modelling our custom business entities. When using Atlas we used to model each business entity to a new type inheriting from a common supertype:
Dataset
. What we would then add to the subtypes were their specific fields (aspects). Search and lineage would also work out-of-the-box because each of these subtypes would also be (extend) the
Dataset
supertype and ie. relationships are defined with a Dataset type as source and destination. In this scenario each new business entity would be mapped to a different subtype, all of them being ultimately Datasets. In
Datahub
, afaik, this concept of inheritance does not exist as such. What would be then the best way to model custom business entities (all conceptually datasets, but with different properties) according to its data model? I found two options: • To onboard a new Datahub entity for each of our custom business entities. This looks fine in principle, however it is immediately noticeable how lineage and search are closed and defined within the Datahub Dataset entity itself. • To add new aspects to the existing Datahub Dataset entity in order to cover the peculiar characteristics of our custom business entities. This looks fine too at first glance, however I’m worried that it could drift into a quite messy flat structure with tons of different aspects which I believe it can be quite difficult to maintain. We also thought about using custom properties, but the lack of a schema plus the impossibility to search against them does not make this solution very attractive. This said, I would be very grateful if you could make me aware of any best practices to follow in this case. I apologise in advance if I missed some points or if the explanation above contains imprecisions! Many thanks!
👍 1
b
Hi Claudio! First of all, thanks for all of the great details!
So we've also come across this decision recently. We want to model Streams, Tables, and Views all as "subtypes" of the Dataset model
😬 1
The way we plan to model inheritance in a generalizable way is to indeed add a standard "Type" aspect to our entities where it applies. This will contain a field that denotes what sub-type the entity belongs to. In addition, we'll introduce Type-Specific aspects using composition as outlined in solution 2 above^. So for example:
Copy code
Stream Example 

{
   "type": "STREAM",
   "streamProperties": {
        "topic": <topic-name> 
        "broker": <broker ref> 
        ....
   }
   .... other standard Dataset aspects
}
😍 3
b
Thank you for your prompt reply John. It sounds great! I’m definitely looking forward to it! Do you have a ballpark estimate for when this feature may be introduced?
b
cc @mammoth-bear-12532 We are trying to prioritize this now.. we recognize it's very important to many use cases. We are hoping to have something concrete committed in the next ~3 weeks I'd estimate
m
Yup three weeks sounds reasonable. @breezy-guitar-97226 let's collaborate directly on this so that the feature will work correctly for your needs.
b
sounds great! happy to collaborate on this!