Hello everyone, How does DataHub create the lineag...
# troubleshoot
s
Hello everyone, How does DataHub create the lineage for Redshift objects? In particular, I want to know where DataHub retrieved the information for the lineage attached between the s3 files and the Redshift table. Is there a particular view that is being ingested? Or is DataHub parsing the queries on the table?
d
We use STL_LOAD_COMMITS to get that information: https://docs.aws.amazon.com/redshift/latest/dg/r_STL_LOAD_COMMITS.html
s
Thank you, @dazzling-judge-80093! Follow question, what determines which files get used? Because it appears that it is the same file in different folders (one for each day/time that it is uploaded).
d
we join this table with the query history
s
Is there anything that can be done to filter/hide the duplicates?