Hello, when defining a data model for DataHub is i...
# getting-started
i
Hello, when defining a data model for DataHub is it up to the developer to model every relationship between entities in pdl as seen here for the relationships defined in the out-of-the box data model? Is there any tool to help with this generation?
g
Hey Pedro 👋 You're right that its up to developers to model any additional relationships between entities. In terms of guidance for helping modeling, you can always refer to the existing examples. Out of curiosity, what kind of tool do you think would make things easier for you?
i
Hey Gabe, thank you for your reply. The first thing that came to mind was a visual editing tool akin to UML diagram tools.
Perhaps a parser for a markdown-based diagram tool like mermaid.js could help.
g
I like that- I think a visual way to configure entities, aspects & their relationships would be very useful to folks
ah, so do you feel like understanding the existing relationship graph or extending it is a bigger challenge?
i
Just something that will automatically generate the pegasus files would help tremendously and also gives a fast way to understand the data model that someone may be extending/adapting/creating.
g
those two tools would solve different problems, correct?
I like being able to automatically generate a chart- perhaps we could even put that under version control and have it be re-checked in after each change
i
Version control and auto-generation of the data model in a chart sounds like a great idea
I think there are a couple of use-cases that can addressed here.
1.) Understand the current data model a given DataHub deployment has.
2.) Update/extend an existing model without having to directly edit pegasus files, which can be a significant number making it a bit hard to grok the underlying model.
3.) Generate Data Models from scratch (this is a current need for me) and making writing the pegasus records an easier & less prone task.
This Data Model viz/edit tool could even be part of the DataHub UI (admin probably) with import/export capabilities to enable version control (think Grafana dashboards)
g
These are all great ideas- would you be interested in filing an issue on DH open source to capture some of this? I wouldn't want it to get lost in slack. There's a feature-request tag for this sort of thing
i
Sure
g
for #3 - are you unable to make progress without a tool to generate data models?
i
I'm still wrapping my head around DataHub generally speaking so it's little soon to say.
So far I haven't tried to create a data model.
g
ok 👍 sounds good
i
Here is the issue: https://github.com/linkedin/datahub/issues/2080 If there is anything that needs elaboration, feel free to ping me
g
thanks!