I am building an application for data scientists to track and document their work. I am thinking to use DataHub as the backend for storing all the metadata and dependencies (dataset, notebook, dashboards...)? It seems a lot of those entities are on the roadmap but not available yet. I am concerned if this would require too many customizations in the short term, and it would be easier to start with a generic database for now and maybe get back to DataHub later on. Any opinions out there? Sorry I am not very familiar with the code yet, and just trying to get a sense how difficult it is to customize. Any pointers would be appreciated, if anyone attempted something similar.
2 years ago
Hey George, thanks for your interested in using DataHub. You're correct that not all entities have been open sourced yet. It is possible to fork the repo and add your own entities (see https://github.com/linkedin/datahub/blob/master/docs/how/entity-onboarding.md for more details) and it's difficult to say the effort involved compared to setting up your own generic store now and do the migration later.