With the airflow backend as defined <https github com linked DataHub #ingestion

With the airflow backend, as defined <here>, what ...

calm-sunset-28996

05/05/2021, 9:51 AM

With the airflow backend, as defined here, what happens if something fails? Because this can be quite a risk for our production jobs?

plain-waiter-52883

05/05/2021, 11:46 AM

Same doubt. I suppose the better way is emitter operator, but I haven’t figured it out yet about emitting the pipeline in this operator)

mammoth-bear-12532

05/05/2021, 2:50 PM

@calm-sunset-28996 what kind of failures are you imagining? Local processing failure or failure to write to datahub (Kafka / rest)

calm-sunset-28996

05/05/2021, 4:50 PM

Anything, ranging from local processing, to writing, to not getting a proper connection, to something wrongly formatted. We just started looking into this, so I’m not sure yet what the possible risks are, but in a first glance it looks rather intrusive, because if it fails everything goes down? (If I compare it to an isolated operator, which will just be a failed task.)

gray-shoe-75895

05/05/2021, 7:09 PM

Yep this is something I’ve been thinking about as well. So current state is that any failure in the lineage backend will cause the entire task to fail, which is obviously not ideal. I’m thinking about changing it so that failures in the lineage backend simply print error messages but do not cause the task to fail, and then adding a “strict” config option to revert back to the current behavior

👍 1

calm-sunset-28996

05/05/2021, 7:19 PM

But it could potentially be worse than just one task that fails, no? Take that I input a bogus connection, that will be propegated to the DatahubGenericHook where it will raise an AirflowException. This is all fine and as expected. However as the lineage backend will be used in all tasks automatically, this will cause all tasks to fail? Your solution indeed would fix this, without it I think it can be a bit dangerous 🙂

gray-shoe-75895

05/05/2021, 7:55 PM

Yep you’re totally right - the lineage backend gets invoked for every task, so it failing would cause every single task to fail

Open in Slack

Previous Next