Hi everyone, I wonder if anyone has figured out a ...
# advice-metadata-modeling
g
Hi everyone, I wonder if anyone has figured out a way (or needed) to model DataJobs and DataFlows in a way that one DataJob can represent a generic function that can be part of multiple DataFlows. As far as I can tell there’s only one “slot” for a DataFlow in the relationship
IsPartOf
relationship outgoing from a DataJob.
b
hey Antonio! yeah right now a DataJob can only belong to one DataFlow (as you saw in the model there's a singular
flow
in DataJobKey). Can you explain your use case here a little bit?
g
Hi Chris! 👋 The use case we’re looking at right now is perhaps quite modular. We’re using Kubeflow instead of Airflow and are constructing DAGs in a dynamic way. For example, let’s say I have 10 generic Jobs that have different purposes, some more generic, some more specific. Imagine that there’s a few of them that can be re-used across multiple DAGs. So, we use the Kubeflow APIs to build DAGs and some of these Jobs (which are basically python functions) are re-used, they are not DAG-specific. So, let’s say I have a Job that performs a cleanup, for example. I want to use it in multiple DAGs. Then, when I’m pushing metadata to DataHub, it would be nice if I could express that this Job has been part of multiple pipelines, and when it was executed at time
t
, it was linked to the execution of a pipeline
A
execution. However, the execution of this Job at time
t + delta
is actually linked to the execution of a pipeline B. I hope my example is not too confusing 😅
b
no that's extremely helpful! definitely something that makes sense. but unfortunately not something that it appears we support in our model at this time.. however it's something we could build! Would you mind filing a feature request if there isn't one already for supporting multiple data flows for a single data job? feature requests really help us prioritize what to work on and see what our community wants
here's the feature requests portal: https://feature-requests.datahubproject.io/