Hello Team! I’m trying to setup datahub and send ...
# getting-started
a
Hello Team! I’m trying to setup datahub and send our inlets/outlets from airflow to datahub. We have and edge case and would be curious if somebody had this use-case before. We have a dag and we run this dag with different data sources and different inlets based on type. Example: dag run 1 - type A dag run 2 - type B dag run 3 - type A Is there a way to see these 2 logical groups individually ? Right now, because we have the same dag id, the lineage is always updated by the latest dag run. It would be interesting to have them grouped by type or at least see both of the pipelines somehow.
d
Currently, we only show the merged state on the lineage graph even though we store this information per run. but on the Runs page should be able to see the inlets/outlets per run like here -> https://demo.datahubproject.io/tasks/urn:li:dataJob:(urn:li:dataFlow:(airflow,datahub_li[…]kend_demo,prod),run_data_task)/Runs?is_lineage_mode=false For this you have to enable to capture runs (_capture_executions_ property in the config) as well with the ->
Copy code
capture_executions (defaults to false): If true, it captures task runs as DataHub DataProcessInstances. This feature only works with Datahub GMS version v0.8.33 or greater.
https://datahubproject.io/docs/lineage/airflow/
a
That’s clear. Is it also possible have in the UI data pipelines but have a dedicated name instead of the dag name?