Hello, I'm looking into Datahub as a solution for ...
# getting-started
i
Hello, I'm looking into Datahub as a solution for my company's needs (data lineage & discovery). One of the our main requirements is that the underlying Data Model of the metadata system is evolvable. • Is DataHub's dynamic model able to change gradually over time for a given entity? • If DataHub supports this, how does it affect search capabilities? • Is there documentation on this? I've read through the markdown files in Github but couldn't find anything. Any help would be greatly appreciated, thank you for taking the time to read 🙂
m
Welcome @incalculable-ocean-74010 ! As long as the changes you make to the model are backwards compatible, you should be fine.
i
Thank you @mammoth-bear-12532 😄 What if they aren't? Is DataHub resilient enough to deal with it? By resilient I mean simply not having search results for the non-backwards compatible section of the model
Making things a little more concrete, let's say I model a User entity which has a office string field. An update to the user data model, removes this field. Will DataHub continue to be able to search users by office, for old records?
m
You shouldn’t remove the field (from the model), that would break persistence, and other things (you won’t be able to read old data with new schema). Just mark the field as deprecated and stop populating it. Search will continue to work against old data with that field populated.
Essentially, treat the model as an API
i
I see, this then means that data models should be append-only, so as to not break persistence?
m
Yes.
i
Understood, thank you for your time and availability @mammoth-bear-12532
👍 1
m
@incalculable-ocean-74010 the current open source models are here : https://github.com/linkedin/datahub/tree/master/metadata-models/src/main/pegasus/com/linkedin
👍 1
e
Schema compatibility is a complicated story. Dropping a field itself is backward compatible but creates the possibility of future incompatibility (adding the field back with a different type). Unfortunately the current OSS DataHub doesn't enforce backward compatibility checks so it's possible to shoot yourself in the foot accidentally.