Hello, team, I am trying to emit airflow data to d...
# troubleshoot
f
Hello, team, I am trying to emit airflow data to datahub by using kafka based hook, but the airflow task report some errors, it looks like the producer was terminated by task and had no enough time to flush msgs to kafka
%4|1678351593.679|TERMINATE|rdkafka#producer-2| [thrd:app]: Producer terminating with 23 messages (9368 bytes) still in queue or transit: use flush() to wait for outstanding message delivery
Any help ideas about this? I didn't find the flush action in the datahub-airflow-plugin emit function
a
Hi, what version/deployment environment are you using? detailed ingestion logs would be helpful here
f
ok, the env version acryl-datahub-airflow-plugin 0.9.5 airflow 2.2.5 and the below is airflow task log. The error
%4|1678351593.679|TERMINATE|rdkafka#producer-2| [thrd:app]: Producer terminating with 23 messages (9368 bytes) still in queue or transit: use flush() to wait for outstanding message delivery
comes from airflow worker log
We can receive from this log that the emitter emit records but the producer was closed before flush
a
Ok, how is your datahub deployed? This could be a resources allocation issue on the kafka pod, but it’s hard to tell without deployment specifics
f
The Datahub all components and airflow are deployed in GKE, and the the Kafka cluster is deployed in the GCP vm. When we produce records from airflow task to Kafka cluster, the task log shows that the records are generated successfully but the airflow worker log shows that the producer was closed before sending to Kafka.
We use acryl-datahub-airflow-plugin:0.9.5 to help push data to datahub by kafka hook, and follow this document https://datahubproject.io/docs/lineage/airflow
a
And the overall datahub version? Would it be possible for you to upgrade?
f
The datahub version is 0.9.5, which version should we upgrade to?
a
0.10.0, you’ll have to run the datahub upgrade container