Hi all! I have a dataset (s3) and a table (snowfla...
# troubleshoot
l
Hi all! I have a dataset (s3) and a table (snowflake) already ingested. How can I indicate lineage between them (s3 is the upstream of / comes before snowflake)?
b
hey Ben! you can always manually specify lineage using file-based ingestion as outlined in our docs here: https://datahubproject.io/docs/generated/ingestion/sources/file-based-lineage/#lineage-file-format
m
What system do you use to load data from s3 - snowflake?
l
@mammoth-bear-12532 We ingest from s3 to Snowflake using Snowpipe
@bulky-soccer-26729 This is the method I used, but unfortunately I've read that only dataset-to-dataset is supported. I ended up with a new "dataset" of snowflake with identical name to the original, real snowflake table (that's problem #1). Another problem is that the new snowflake dataset that was created does not seem to show any lineage relation to the s3 dataset
m
I'm sure there is a more automated way to extract this lineage if you are doing a standard load using snowpipe
l
@mammoth-bear-12532 I was sure too, but specifically with s3 I only ran into what Chris suggested, or some python version of it. The plan is to integrate the pythonic way into the code that creates the Snowpipes. But this still doesn't work as I would expect
m
If you are getting identical looking entities in snowflake, take a look at the urn that you have carefully for the two “duplicate” entities. There is probably a small difference there.
You can copy the urn from the UI using the “copy” button on the top right on the entity page
l
@mammoth-bear-12532 thanks a lot! That's exactly what I was missing