Hello,
We integrated datahub in our spark job (scheduled with airflow) which is reading data from our s3 bucket and writing data to SQL database.
At the end of the spark job, the job is blocked after receiving the MetadataWriteResponse.
The spark job correctly loaded the data into the SQL table and the metadata is ok on datahub but the job is not ending and it fails after a timeout.
[2022-04-05, 16
3321 ] {spark_submit.py:488} INFO - 22/04/05 14
3321 INFO McpEmitter: MetadataWriteResponse(success=true, responseContent={"value":"urn
lidataJob:(urn
lidataFlow:(spark,ANACOUNTERPARTY,local[*]),QueryExecId_6)"}, underlyingResponse=HTTP/1.1 200 OK [Content-Length: 91, Content-Type: application/json, Date: Tue, 05 Apr 2022 14
3321 GMT, Server: nginx/1.21.6, X-Restli-Protocol-Version: 2.0.0] [Content-Length: 91,Chunked: false])
[2022-04-05, 16
3805 ] {timeout.py:36} ERROR - Process timed out, PID: 662
[2022-04-05, 16
3805 ] {spark_submit.py:623} INFO - Sending kill signal to spark-submit
Any idea to solve this issue ?
we opened a ticket here :
https://github.com/datahub-project/datahub/issues/4583