Dear DataHub Team, We are now developing our cust...
# advice-metadata-modeling
f
Dear DataHub Team, We are now developing our custom datahub actions based on our custom metadata models. But we noticed it is quite tricky to put them together, since the datahub actions package requires the standard acryl-datahub package as a dependency, and we don’t want to fork it but still want to use our own build of the acryl-datahub package which contains our own models. It might appear reasonable to decouple the datahub-action from depending on the acryl-datahub package, since it is a framework and should not rely too much on the model details in the acryl-datahub package. Does that make sense?
r
@famous-waitress-64616 could you take a look here? Thanks!
h
Hey @fierce-guitar-16421 This PR introduced the possibility of specifying custom python packages along with standard acryl-datahub package (starting v0.12.0) in datahub actions, which may allow using custom datahub packages created with custom metadata ingestion sources. Can you take a look if that works for you ? Btw do you really mean "custom metadata models" Or rather"custom metadata sources" ? Models are core to datahub-gms than datahub-actions.
f
Yes I do mean “custom metadata models” rather than sources. We have our fork and introduced our own new aspects to the datasets, in the metadata models. We also want to use the action framework to handle the events on such new aspects, but then we see that the action framework depends on the standard acryl-datahub package (which of course does not contain our own new aspects). So we need to install our own build of the acryl-datahub package to overwrite the standard one and then be able to use our own model. Ideally the action framework does not have to depend on acryl-datahub (or can have acryl-datahub as an optional plugin), so we can in virtualenv easily install actions and then our own build of acryl-datahub on top of it, without worrying about potential conflicts which could happen with overwriting installations.
g
We do have a (somewhat experimental) mechanism for handling this. You can generate a custom models package that, when installed, will take precedence over the models bundled with acryl-datahub. See these docs https://datahubproject.io/docs/next/metadata-modeling/extending-the-metadata-model/#optional-step-7-use-custom-models-with-the-python-sdk for more details. We currently use this for the
acryl-datahub-cloud
package This is still a pretty new feature, so let me know if you run into issues with it!
f
Thanks for the info! It is a cool feature indeed. 💡 However, we maintain our own fork (since we need to add UI components to interact with custom entities) and have put the model schemas together with the core models. It might be hard to separate them again for this custom setup. But it is great to have this option in mind, and it might be helpful if we will need more custom models in the future. 👍
g
The above approach should work regardless of how you've modified the models That said, it definitely is pretty experimental, but once this PR is merged https://github.com/datahub-project/datahub/pull/9391, we're likely going to start publicizing it a bit more
👍 1