handsome-football-66174
11/17/2021, 1:37 PMhandsome-football-66174
11/17/2021, 1:38 PMearly-lamp-41924
11/17/2021, 5:18 PMearly-lamp-41924
11/17/2021, 5:18 PMhandsome-football-66174
11/17/2021, 5:31 PMearly-lamp-41924
11/17/2021, 7:51 PMearly-lamp-41924
11/17/2021, 7:51 PMhandsome-football-66174
11/18/2021, 2:23 PMearly-lamp-41924
11/18/2021, 4:08 PMhandsome-football-66174
11/18/2021, 5:36 PMearly-lamp-41924
11/18/2021, 5:38 PMhandsome-football-66174
11/18/2021, 5:38 PMearly-lamp-41924
11/18/2021, 5:39 PMearly-lamp-41924
11/18/2021, 5:39 PMearly-lamp-41924
11/18/2021, 5:39 PMearly-lamp-41924
11/18/2021, 5:39 PMearly-lamp-41924
11/18/2021, 5:39 PMhandsome-football-66174
11/18/2021, 5:40 PMearly-lamp-41924
11/18/2021, 5:42 PMearly-lamp-41924
11/18/2021, 5:43 PMhandsome-football-66174
11/18/2021, 5:47 PMearly-lamp-41924
11/18/2021, 5:48 PMearly-lamp-41924
11/18/2021, 5:48 PMhandsome-football-66174
11/18/2021, 5:54 PMpip install acryl-datahub[airflow]
• Create Airflow hooks for Datahub
# For REST-based:
airflow connections add --conn-type 'datahub_rest'
'datahub_rest_default'
--conn-host '<http://localhost:8080>'
# For Kafka-based (standard Kafka sink config can be passed via extras):
airflow connections add --conn-type 'datahub_kafka'
'datahub_kafka_default'
--conn-host 'broker:9092'
--conn-extra '{}'
• Configuration to airflow.cfg
[lineage]
backend = datahub_provider.lineage.datahub.DatahubLineageBackend
datahub_kwargs = {
"datahub_conn_id": "datahub_rest_default",
"cluster": "prod",
"capture_ownership_info": *true*,
"capture_tags_info": *true*,
"graceful_exceptions": *true*
}
# The above indentation is important!
• Plugins – Need to be installed like
pip install 'acryl-datahub[kafka]'
handsome-football-66174
11/18/2021, 6:47 PMWith above configurations I see following logs in the previous environment , but the not the same in new environment (highlighted portion is missing) -
[2021-11-18 18:03:09,766] {pipeline.py:44} INFO - sink wrote workunit d.a
[2021-11-18 18:03:09,835] {pipeline.py:44} INFO - sink wrote workunit d.b
[2021-11-18 18:03:09,911] {pipeline.py:44} INFO - sink wrote workunit d.c
[2021-11-18 18:03:09,913] {python.py:151} INFO - Done. Returned value was: None
[2021-11-18 18:03:09,920] {_lineage_core.py:220} INFO - DataHub lineage backend - emitting metadata:
{"auditHeader": null, "proposedSnapshot": {"com.linkedin.pegasus2avro.metadata.snapshot.DataFlowSnapshot": {"urn": "urn:li:dataFlow:(airflow,datahub_glue_ingestion,prod)", "aspects": [{"com.linkedin.pegasus2avro.datajob.DataFlowInfo": {"customProperties": {"start_date": "1637020800.0", "tags": "['airflow_tagging']", "catchup": "False", "timezone": "'UTC'", "fileloc": "'/home/ec2-user/airflow/dags/datahub_glue_ingestion.py'", "_concurrency": "64", "is_paused_upon_creation": "None", "_access_control": "None", "_default_view": "'tree'"}, "externalUrl": "https://<airflowhostname>:443/tree?dag_id=datahub_glue_ingestion", "name": "datahub_glue_ingestion", "description": "An example DAG which ingests metadata from Glue to DataHub\n\n", "project": null}}, {"com.linkedin.pegasus2avro.common.Ownership": {"owners": [{"owner": "urn:li:corpuser:airflow", "type": "DEVELOPER", "source": {"type": "SERVICE", "url": "datahub_glue_ingestion.py"}}], "lastModified": {"time": 1637258587166, "actor": "urn:li:corpuser:airflow", "impersonator": null}}}, {"com.linkedin.pegasus2avro.common.GlobalTags": {"tags": [{"tag": "urn:li:tag:airflow_tagging"}]}}]}}, "proposedDelta": null, "systemMetadata": null}
{"auditHeader": null, "proposedSnapshot": {"com.linkedin.pegasus2avro.metadata.snapshot.DataJobSnapshot": {"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,datahub_glue_ingestion,prod),ingest_from_glue)", "aspects": [{"com.linkedin.pegasus2avro.datajob.DataJobInfo": {"customProperties": {"task_id": "'ingest_from_glue'", "execution_timeout": "7200.0", "email": "['x']", "_downstream_task_ids": "[]", "_inlets": "[]", "label": "'ingest_from_glue'", "_outlets": "[]", "_task_type": "'PythonOperator'", "_task_module": "'airflow.operators.python'", "start_date": "datetime.datetime(2021, 11, 16, 0, 0, tzinfo=Timezone('UTC'))", "trigger_rule": "'all_success'", "depends_on_past": "False", "end_date": "None", "sla": "None", "wait_for_downstream": "False"}, "externalUrl": "https://<airflowhostname>:443/taskinstance/list/?flt1_dag_id_equals=datahub_glue_ingestion&_flt_3_task_id=ingest_from_glue", "name": "ingest_from_glue", "description": null, "type": {"string": "COMMAND"}, "flowUrn": null, "status": null}}, {"com.linkedin.pegasus2avro.datajob.DataJobInputOutput": {"inputDatasets": [], "outputDatasets": [], "inputDatajobs": []}}, {"com.linkedin.pegasus2avro.common.Ownership": {"owners": [{"owner": "urn:li:corpuser:airflow", "type": "DEVELOPER", "source": {"type": "SERVICE", "url": "datahub_glue_ingestion.py"}}], "lastModified": {"time": 1637258587166, "actor": "urn:li:corpuser:airflow", "impersonator": null}}}, {"com.linkedin.pegasus2avro.common.GlobalTags": {"tags": [{"tag": "urn:li:tag:airflow_tagging"}]}}]}}, "proposedDelta": null, "systemMetadata": null}
[2021-11-18 18:03:09,932] {base.py:78} INFO - Using connection to: id: datahub_rest_default. Host: https://<host>, Port: None, Schema: , Login: , Password: None, extra: {}
[2021-11-18 18:03:09,936] {base.py:78} INFO - Using connection to: id: datahub_rest_default. Host: https://<host>, Port: None, Schema: , Login: , Password: None, extra: {}
[2021-11-18 18:03:10,044] {taskinstance.py:1211} INFO - Marking task as SUCCESS. dag_id=datahub_glue_ingestion, task_id=ingest_from_glue, execution_date=20211118T180307, start_date=20211118T180309, end_date=20211118T180310
[2021-11-18 18:03:10,066] {taskinstance.py:1265} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2021-11-18 18:03:10,106] {local_task_job.py:149} INFO - Task exited with return code 0
early-lamp-41924
11/18/2021, 7:16 PMearly-lamp-41924
11/18/2021, 7:17 PMearly-lamp-41924
11/18/2021, 7:18 PMcurl --location --request POST '<http://localhost:8080/entities?action=search>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"input": "*",
"entity": "dataFlow",
"start": 0,
"count": 10
}'
handsome-football-66174
11/18/2021, 9:26 PMhandsome-football-66174
11/19/2021, 2:07 PMearly-lamp-41924
11/19/2021, 4:17 PM