Hello! I have a question about data lineage: Besid...
# ingestion
a
Hello! I have a question about data lineage: Besides visualizing data lineage on UI, is there anyway to use that lineage for detecting job failure and prevent data flowing from impaired upstream source ?
g
Hey @average-autumn-35845! This exactly a use case we have in mind for DataHub. On Datahub's roadmap now is an airflow operator that can poll Datahub and make decisions based on the results. Separately, we also are working on adding data quality integrations via Great Expectations.
I would love to hear more about your use case- from system would you want to data job failures? How would you define a failure?
a
@green-football-43791 Yup, really appreciate team's works. For example: we have a data-source A, running through ETL task B, but it doesn't pass through some assertions (can be defined in Great Expectations) or some test-cases or it could be any unexpected exceptions. Then, I think there will be some mechanisms to alert on the UI/notifications or even prevent the downstream task C from using it? Then, I think we will have time to fix before data propagates
l
@average-autumn-35845 We have definitely been thinking about this use case. Do you currently use Airflow?
a
Sure, our team uses Airflow for doing ETL jobs, but data quality and observability seem missing alot. 😄