Hi everyone, I encounter a bug (or expected behavi...
# ingestion
c
Hi everyone, I encounter a bug (or expected behavior?) when using airflow lineage integrate with datahub. I have one DAG that’s executing an ETL that inlets and outlets are same dataset (i.e. backfill some value into same table) so I configure both inlets and outlets to same dataset. Once I configured and run it, I found that my dataset are not able to be found on datahub UI, it keeps showing URN not found error 😮 Then I change my DAG’s inlet and outlet to some temp/unrelated Datasets, my origin Dataset become visible again (and all schema are keeping as origin) Is this behavior expected? one DAG (or task) cant have same Dataset in both inlet and outlets? *I’m not a native english speaker, sorry if my presentation is unclear/confusing.
h
Did you happen to check the logs to see the exact error? AFAIK, there's no limitation of having the same dataset as input and output, but it's hard to say without error logs
c
there’s no specific error log happened when I search to that specific Dataset on datahub UI. From docker compose log looks like a normal query in elastic search.
Copy code
datahub-mae-consumer      | 10:17:20.908 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
h
does the frontend's container show any errors?
Or the MCE's?
c
umm, nope when I search on UI.
it’s just showing a dialog about urn not found
h
Have you created the dataset itself in Datahub?
What kind of dataset is it? A table in a database?
c
a redshift table, and already use datahub ingest for redshift source.
Not create manually, it’s create via
datahub ingest -c redshift_config.yml
h
okey, so they should exit. Hmm...strange 🤔 Only thing I can think of is some typo somewhere, but otherwise I have no idea.
c
i’ll try to reproduce later this week if have time. Thanks for helping 😄
Hi @high-hospital-85984, i’ve test again on my airflow (2.0.2) and its able to reproduce.. if same Dataset appears in both inlets and outlets DAG Task config, it will cause that Dataset on datahub UI not able to found (dialog shows urn not found) Right after i change to different inlets and outlets, that Dataset become visible again. Another test - if one Dataset in outlet is one of inlets Datasets’ upstream, then it’s fine on Datahub UI to find all of them. Summarize, if on airflow • [Dataset A] -> Airflow Task -> [Dataset A] ◦ this scenario, Dataset A won’t be able to find on Datahub UI (urn not found) • [Dataset A] -> [Dataset B] -> Airflow Task -> [Dataset A] ◦ this scenario, both Dataset A and B are able to find on Datahub UI
h
This is super strange 😅 @gray-shoe-75895 or @green-football-43791 any ideas?
c
g
@cuddly-spoon-5445 I tried reproducing the issue locally and was able to create an entity with the same upstream and downstream dependency:
can you post the full error?
it also would be helpful to see if there are any useful logs from the datahub-frontend-react container
another thing worth checking is the date of the latest commit- I know there were issues with circular dependencies a few months ago, but they were fixed a bit back