Hi folks, I think I've found a bug with Owner inge...
# ingestion
i
Hi folks, I think I've found a bug with Owner ingestion from using Airflow backend. Using the the default args in screenshot 1, the user 'airflow' doesn't exist in my system to begin with. After I run the DAG, the is visible on the pipeline (screenshot 2). If i click on the user image, the UI breaks and I see the contents of screenshot 3 in the Firefox console If I ingest a via another means (for example user 'Steve' in an MCE over HTTP), the user gets created OK and the UI behaves as expected. If I then pass in user 'Steve' in the DAG default args, I get the same behaviour as above (broken UI). Thanks
b
Hi Steve. Thanks for reporting. If you change the "owner" field to "urnlicorpuser:airflow2" do things work?
i
hi john, yeah there's a slight change in the UI, i'll send some sample screenshots and explanaiton over in a few moments
i modified my MCE file to ingest the airflow user via HTTP. when i navigate to the dataset the user is an owner of, then click on the airflow user, i see the contents of screenshot 1 (looks good) if i navigate to the pipeline (ingested from Airflow), i see the airflow user is the owner (after applying your suggestion above), but the user screen isn't populated (see screenshot 2) seems like a context issue on the front end perhaps. its not an important issue for me yet though, just an observation 🙂
b
This is interesting -- in both cases no user name? So at least the page doesn't break when you do urnlicorpuser:airflow2. @gray-shoe-75895 Do you think we should be default box the owner into a datahub user urn?
i
i'll fire up a clean stack to see what happens when i ingest a user via MCE before introducing airflow ingestion. possibly lots of changes going on corrupted things a little
b
Okay great -- So basically DataHub is supposed to be quite forgiving of missing metadata... ie instead of breaking you'll just see an empty shell of a page.. seems to be that there has been no information ingested corresponding to the users associated with the airflow job within datahub... However you can ingest user metadata over MCE using the CorpUser model
i
i just re-ran metadata ingest via MCE (no airflow involvement yet). it does not show a user name, presumably since i've only given a URN on line 12 of the attached JSON. that's good to know it's fairly forgiving, i'll try a corp user import later on, appreciate the pointers
just to wrap up, doing a corp user import worked, user is now displaying properly, thanks for the help
g
@big-carpet-38439 I’m thinking we need to allow the airflow ownership stuff to be enabled/disabled via config
b
Yeah I think I agree - it's too easy for this type of misingest to occur
g
Wanted to circle back on this @icy-holiday-55016: I believe the bug that you encountered was a UI issue, which should’ve been fixed by https://github.com/linkedin/datahub/pull/2553. Additionally, we now support more configuration options for the airflow lineage backend, so you can still disable the ownership capturing if you’d like https://datahubproject.io/docs/metadata-ingestion/#lineage-with-airflow
i
Thanks for that @gray-shoe-75895, I'll try the ingest again when we pull in the latest master and let you know how we get on
👍 1