<@U010D2H6Q68> (or someone else): there has been ...
# getting-started
h
@clean-bear-94984 (or someone else): there has been some work on adding support for DataJobs and DataTasks: https://github.com/linkedin/datahub/pull/2008 but it seems like the feature is not fully implemented yet. Any plans on doing so? If not, mind if we pick up the work?
b
i think that'd be great ! there's been a lot of interest in data pipeline observability
l
What orchestrator do you plan to integrate? Airflow?
h
Yup, and maybe some ML orchestrator, like Flyte, later on (we might keep that internal only, unless the community finds it valuable)
l
Airflow would be awesome. Would be great to ways to capture comprehensive information per run. Also, would be great to emit lineage information. @mammoth-bear-12532 can comment more
ML orchestrator would be great to - we've been thinking about that as well
h
I think the plan is to keep it on a job-level, i.e. not capture the run info, per the RFC: https://github.com/linkedin/datahub/tree/master/docs/rfc/active/1820-azkaban-flow-job
m
Yeah the first goal is to just create a home for the flow (job) itself ...
run-info can probably be added on as an additional aspect with "last N entry / last M months retention" to avoid blowing up the storage requirements
b
Have we also been thinking about adding "pipelines"?
l
flows == pipelines I think
b
i see
h
PR: https://github.com/linkedin/datahub/pull/2197 My first dabble this deep in the datahub backend, so be gentle 😅
🙌 2