Hi I'm trying crate lineage for my airflow in data...
# advice-metadata-modeling
r
Hi I'm trying crate lineage for my airflow in datahub. I have setup the the connection in airflow. The airflow Metadata is in postgres Where can I find some example
1
d
Hi, you could do this by our plugins - this doc might be helpful : https://datahubproject.io/docs/lineage/airflow/#using-datahubs-airflow-lineage-plugin
r
Yes I gone through it and setup the integration. Now while u run the dag, I can see that it's enabled in log. I saw the sample dag which has inlets and outlets as snowflake. In our product we have s3 as source and there where 10+ tasks in airflow. So in this case how can i leveraged the lineage in datahub.
n
You can configure inlets and outlets directly in the dag by creating respective URNs for the input and outputs of each task. These will get picked up automatically and shows as lineage in Datahub. Ill copy an example
So for the s3 file you can first ingest it into datahub so the URN exists, then add the URN as an inlet to the task operator, then Datahub will tie it together. If you dont ingest the s3 file first into datahub, then datahub will create kind of an empty placeholder entity to show in the lineage, but it should still work.
t
@numerous-address-22061 Could you please show the full DAG code example? I've just started exploring Datahub and Airflow, and I'm having trouble figuring out how to integrate this code into the DAG
r
@numerous-address-22061 can you share steps to ingest s3 files into datahub. This is wat I'm looking for .
n
@red-florist-94889 its just a normal ingestion, the doc is here https://datahubproject.io/docs/generated/ingestion/sources/s3/
@thousands-church-97463 I cant share my companies code but there is an example in the docs here as to how to set inlets and outlets once you have the plugin installed https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_demo.py
Every task allows the parameter
inlet
and
outlet
t
@numerous-address-22061 Thank you!
n
if the inlets/outlets exists in datahub(have an equivalent URN) then Datahub will hook up the lineage between that task, and the source dataset -> sink dataset automatically
Granted the plugin needs to be installed and working