Hi there, I am trying to add such lineage (dataset...
# ingestion
s
Hi there, I am trying to add such lineage (dataset -> datajob -> dataset), but kept failing (I refered to this link). Adding lineage using python SDK was successful using mce_builder.make_lineage_mce, but seems this function only support dataset entity not datajob. Does python sdk currently support one easy api for adding (dataset -> datajob -> dataset) lineage? or can you give me any other way around? thanks
hi again, I got the above issue fixed. Adding lineage
(dataset -> datajob(airflow task) datset)
works fine. But the thing is when the airflow task is finished, it seems it is automatically sending out its status to datahub and updates its status. and I assume this is overwriting the task’s status… How can I maintain its lineage even after the task is done?
Copy code
[2023-05-16, 16:44:12 ] {_plugin.py:147} INFO - Emitting Datahub Dataflow: DataFlow(urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f892da85c40>, id='BigQueryLineageOperator_table_test', orchestrator='airflow', cluster='prod', name=None, 'is_paused_upon_creation': 'None'
[2023-05-16, 16:44:12 ] {_plugin.py:165} INFO - Emitting Datahub Datajob: DataJob(id='create_test', urn=<datahub.utilities.urns.data_job_urn.DataJobUrn object at 0x7f892daac6a0>, flow_urn=<datahub.utilities.urns.data_flow_urn.DataFlowUrn object at 0x7f89190f99d0>, name=None, description=None, properties={'_inlets': '[]', '_outlets': '[]', 'depends_on_past': 'False', 'email': 'None', 'label': "'create_test'", 'execution_timeout': 'None', 'sla': 'None', 'trigger_rule': "<TriggerRule.ALL_SUCCESS: 'all_success'>", 'wait_for_downstream': 'False', 'downstream_task_ids': 'set()', 'inlets': '[]', 'outlets': '[]'}, url='/?flt1_dag_id_equals=&_flt_3_task_id=create_test', tags={, 'BigQueryLineageOperator', 'data_discovery', 'datahub'}, owners={'XXXX'}, group_owners=set(), inlets=[], outlets=[], upstream_urns=[])