With the airflow backend, as defined <here>, what ...
# ingestion
c
With the airflow backend, as defined here, what happens if something fails? Because this can be quite a risk for our production jobs?
p
Same doubt. I suppose the better way is emitter operator, but I haven’t figured it out yet about emitting the pipeline in this operator)
m
@calm-sunset-28996 what kind of failures are you imagining? Local processing failure or failure to write to datahub (Kafka / rest)
c
Anything, ranging from local processing, to writing, to not getting a proper connection, to something wrongly formatted. We just started looking into this, so I’m not sure yet what the possible risks are, but in a first glance it looks rather intrusive, because if it fails everything goes down? (If I compare it to an isolated operator, which will just be a failed task.)
g
Yep this is something I’ve been thinking about as well. So current state is that any failure in the lineage backend will cause the entire task to fail, which is obviously not ideal. I’m thinking about changing it so that failures in the lineage backend simply print error messages but do not cause the task to fail, and then adding a “strict” config option to revert back to the current behavior
👍 1
c
But it could potentially be worse than just one task that fails, no? Take that I input a bogus connection, that will be propegated to the DatahubGenericHook where it will raise an AirflowException. This is all fine and as expected. However as the lineage backend will be used in all tasks automatically, this will cause all tasks to fail? Your solution indeed would fix this, without it I think it can be a bit dangerous 🙂
g
Yep you’re totally right - the lineage backend gets invoked for every task, so it failing would cause every single task to fail