Heya I believe I’ve found a bug with the Airflow p...
# ingestion
q
Heya I believe I’ve found a bug with the Airflow plugin: https://github.com/datahub-project/datahub/issues/8058. In summary: when
datahub.capture_ownership_info = false
, the owners of Airflow pipelines are removed on each dag run.
1
I believe this can be fixed by adding a property:
capture_ownership_info
to the DataFlow task, and skipping https://github.com/datahub-project/datahub/blob/91caa0c2b9d55c7ef55177c087ab4e073c[…]metadata-ingestion/src/datahub/api/entities/datajob/dataflow.py if it is false.
basically it shouldn’t be creating a metadata change proposal at all (for owners) if capture_ownership_info is false.
bear in mind i haven’t tested this, this is just my first guess on how to fix the issue
a
@famous-waitress-64616 might have some insight here!
f
Hi Matthew, our teams is looking into this but it'll be a day or so before I can get back to you. Thanks for your patience
q
That’s great thank you!
l
@quiet-television-68466 i am ingesting airflow metadata to datahub using datahub_kafka_default . I am not getting any error its showing emission is happening but not ingesting to datahub. Can u pls help me here
g
@quiet-television-68466 looks like you’re right - would you mind opening a PR to make that tweak?
l
@gray-shoe-75895 i am doing via datahub kafka so in that case capture_ownership_info is true so it shouldn’t impact anything
q
Yes I can hopefully get to that early next week!
@limited-forest-73733 is your gms pod writing any logs when that occurs?
l
I am able to integrate airflow with datahub using datahub rest but facing issue with datahub kafka
With gms its introspecting airflow metadata to datahub
@quiet-television-68466 can u pls clear me here. I am integrating airflow with datahub using datahub kafka. Its showing emission is happening in my airflow dag log but nothing is ingesting to datahub nor added to database as well. Whats the problem here. I installed the plugin, set lazy load plugin to false and added the corresponding configuration. Thanks
g
@limited-forest-73733 i suspect that’s because we’re not calling flush in the airflow code. That should be fixed by https://github.com/datahub-project/datahub/pull/8093
l
@gray-shoe-75895 using datahub-rest we can introspect airflow metadata to datahub, facing emission issue via datahub-kafka
a
Yup - the flush call is a no op for the rest sink, but fixes the kakfa sink
l
@gray-shoe-75895 i saw you merged your PR for flush changes, can you please tell me any ETA of new release? Thanks
a
We should be releasing 0.10.3 this week