aloof-arm-38044
03/15/2022, 10:20 PMSpark App / Airflow DAG run -> DataPipeline
and
Spark Job / Airflow Task run -> DataJob
but since the same urn is emitted each time, what you get is updates of the same DataPipeline or DataJob entity for each run.
So this means there is no way of seeing which exact run (update) generated which Dataset because all datasets created by a task will point to the same DataJob entity. Are we missing something?
It seems like the new Timeline API released as part of v0.8.28 would address this if it gets implemented for DataPipelines and DataJobs as well. Is that assumption correct? Any idea how long until that will get rolled out?loud-island-88694
loud-island-88694
aloof-arm-38044
03/16/2022, 9:20 AMaloof-arm-38044
03/16/2022, 9:25 AMimmutable
Datasets. But DataHub also supports tracking changes in mutable
Datasets, such as schema changes (via the latest timeline API). Is there a plan to provide lineage from a Task run to each version of a mutable dataset as it evolves over time?loud-island-88694
loud-island-88694
mammoth-bear-12532