Hello everyone, I am new to datahub and was workin...
# troubleshoot
a
Hello everyone, I am new to datahub and was working to integrate with airflow dag. The example dag completed successfully but i cannot see the metadata on the Datahub UI, request inputs please why is that so ? P.S. I did restart the docker container for datahub as well but no luck :(
message has been deleted
message has been deleted
I cannot see any pipeline tab to check for airflow dag related metadata, any leads will be appreciated please. Thanks !!
d
Can check your task logs if you see anything. Please, can you give us more context how did you set up the airflow integration?
a
These are the Task Logs >>
Copy code
*** Reading local file: /usr/local/airflow/logs/datahub_lineage_backend_demo/run_data_task/2022-10-21T04:13:27.202141+00:00/1.log
[2022-10-21 04:13:27,981] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: datahub_lineage_backend_demo.run_data_task 2022-10-21T04:13:27.202141+00:00 [queued]>
[2022-10-21 04:13:28,016] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: datahub_lineage_backend_demo.run_data_task 2022-10-21T04:13:27.202141+00:00 [queued]>
[2022-10-21 04:13:28,016] {{taskinstance.py:1068}} INFO - 
--------------------------------------------------------------------------------
[2022-10-21 04:13:28,016] {{taskinstance.py:1069}} INFO - Starting attempt 1 of 1
[2022-10-21 04:13:28,016] {{taskinstance.py:1070}} INFO - 
--------------------------------------------------------------------------------
[2022-10-21 04:13:28,029] {{taskinstance.py:1089}} INFO - Executing <Task(BashOperator): run_data_task> on 2022-10-21T04:13:27.202141+00:00
[2022-10-21 04:13:28,041] {{standard_task_runner.py:52}} INFO - Started process 18077 to run task
[2022-10-21 04:13:28,056] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'datahub_lineage_backend_demo', 'run_data_task', '2022-10-21T04:13:27.202141+00:00', '--job-id', '3', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/example/example_datahub.py', '--cfg-path', '/tmp/tmppx4bhhi0', '--error-file', '/tmp/tmphiqs91ln']
[2022-10-21 04:13:28,058] {{standard_task_runner.py:77}} INFO - Job 3: Subtask run_data_task
[2022-10-21 04:13:28,187] {{logging_mixin.py:104}} INFO - Running <TaskInstance: datahub_lineage_backend_demo.run_data_task 2022-10-21T04:13:27.202141+00:00 [running]> on host 41547ec2f672
[2022-10-21 04:13:28,262] {{taskinstance.py:1283}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=mayankjain@economist.com
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=datahub_lineage_backend_demo
AIRFLOW_CTX_TASK_ID=run_data_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-21T04:13:27.202141+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-21T04:13:27.202141+00:00
[2022-10-21 04:13:28,280] {{bash.py:135}} INFO - Tmp dir root location: 
 /tmp
[2022-10-21 04:13:28,281] {{bash.py:158}} INFO - Running command: echo 'This is where you might run your data tooling.'
[2022-10-21 04:13:28,311] {{bash.py:169}} INFO - Output:
[2022-10-21 04:13:28,312] {{bash.py:173}} INFO - This is where you might run your data tooling.
[2022-10-21 04:13:28,312] {{bash.py:177}} INFO - Command exited with return code 0
[2022-10-21 04:13:28,415] {{taskinstance.py:1192}} INFO - Marking task as SUCCESS. dag_id=datahub_lineage_backend_demo, task_id=run_data_task, execution_date=20221021T041327, start_date=20221021T041327, end_date=20221021T041328
[2022-10-21 04:13:28,455] {{taskinstance.py:1246}} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2022-10-21 04:13:28,509] {{local_task_job.py:146}} INFO - Task exited with return code 0
To answer the latter part of your question
I have an MWAA instance running as a docker on my local system
I have installed the datahub as a docker as well
and then followed the below steps
1. DATAHUB_MAPPED_GMS_PORT=58080 python3 -m datahub docker quickstart
2. docker exec -it
docker ps | grep mwaa | cut -d " " -f 1
pip install acryl-datahub-airflow-plugin
3. docker exec -it
docker ps | grep mwaa | cut -d " " -f 1
airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://datahub-gms:58080'
4. added below to the airflow.cfg file
core.lazy_load_plugins False [datahub] datahub.enabled = true datahub.conn_id = datahub_rest_default datahub.cluster = prod datahub.capture_ownership_info = true datahub.capture_tags_info = true datahub.graceful_exceptions = true
Dag completed successfully as well but cannot see anything on the datahub ui
@dazzling-judge-80093 Thanks a lot for looking into, appreciate if you can look at the steps and revert please. Thanks again !!
@witty-plumber-82249 Can anyone please suggest here ?
a
Hi Marank, are you still experiencing this issue?
a
Yes @astonishing-answer-96712
a
Let’s route this to #ui, could you make a new post summarizing the issue there?