< clean bear 94984> or someone else there has been some work DataHub #getting-started

<@U010D2H6Q68> (or someone else): there has been ...

high-hospital-85984

03/05/2021, 11:22 AM

@clean-bear-94984 (or someone else): there has been some work on adding support for DataJobs and DataTasks: https://github.com/linkedin/datahub/pull/2008 but it seems like the feature is not fully implemented yet. Any plans on doing so? If not, mind if we pick up the work?

big-carpet-38439

03/05/2021, 4:14 PM

i think that'd be great ! there's been a lot of interest in data pipeline observability

loud-island-88694

03/05/2021, 5:53 PM

What orchestrator do you plan to integrate? Airflow?

high-hospital-85984

03/05/2021, 6:11 PM

Yup, and maybe some ML orchestrator, like Flyte, later on (we might keep that internal only, unless the community finds it valuable)

loud-island-88694

03/05/2021, 6:22 PM

Airflow would be awesome. Would be great to ways to capture comprehensive information per run. Also, would be great to emit lineage information. @mammoth-bear-12532 can comment more

loud-island-88694

03/05/2021, 6:23 PM

ML orchestrator would be great to - we've been thinking about that as well

high-hospital-85984

03/05/2021, 6:28 PM

I think the plan is to keep it on a job-level, i.e. not capture the run info, per the RFC: https://github.com/linkedin/datahub/tree/master/docs/rfc/active/1820-azkaban-flow-job

mammoth-bear-12532

03/05/2021, 7:10 PM

Yeah the first goal is to just create a home for the flow (job) itself ...

mammoth-bear-12532

03/05/2021, 7:11 PM

run-info can probably be added on as an additional aspect with "last N entry / last M months retention" to avoid blowing up the storage requirements

big-carpet-38439

03/05/2021, 7:41 PM

Have we also been thinking about adding "pipelines"?

loud-island-88694

03/05/2021, 7:42 PM

flows == pipelines I think

big-carpet-38439

03/05/2021, 8:19 PM

i see

high-hospital-85984

03/09/2021, 3:58 AM

PR: https://github.com/linkedin/datahub/pull/2197 My first dabble this deep in the datahub backend, so be gentle 😅

🙌 2

Open in Slack

Previous Next