I'm trying to figure out why the redshift lineage ...
# ingestion
w
I'm trying to figure out why the redshift lineage is not picking anything up ... recipe:
Copy code
source:
    type: redshift
    config:
        platform_instance: etl2_prod
        table_lineage_mode: mixed
        include_table_lineage: true
        database: insightsetl
        password: '${etl2test}'
        include_copy_lineage: false
        profiling:
            enabled: false
        host_port: '<http://pi-redshift-etl-2.ccvpgkqogsrc.us-east-1.redshift.amazonaws.com:8192|pi-redshift-etl-2.ccvpgkqogsrc.us-east-1.redshift.amazonaws.com:8192>'
        stateful_ingestion:
            enabled: false
        username: datahub_ingestion
        capture_lineage_query_parser_failures: true
Where do the lineage query parser failures actually get recorded? How can I review these?
g
The lineage parser failures should get recorded in the logs, and there might also be some summary statistics in the ingestion report summary that’s printed out at the end of the run
The
capture_lineage_query_parser_failures
puts the failed SQL statements in the custom properties section (the “properties” tab)
f
Hi, @wonderful-notebook-20086 did you make any data lineage on Datahub?
g
Hey @wonderful-notebook-20086 - could we hop on a call to debug this? The lineage piece is unexpected, but ingestion should certainly not be deleting any schemas when stateful ingestion is not enabled.
w
Sure - feel free to huddle
f
May i join with you guys on the call about ingestion and lineage?
g
Hey Jon - sorry I missed this. Could we chat tomorrow afternoon? I’m mostly free, so just let me know what time would work well for you
@fierce-monkey-46092 - are you running into the same issue?
f
@gray-shoe-75895 Hello sir, . I’ve deployed Datahub on the docker and ingested some Oracle db onto it. My question is how to create/write/draw a lineage between on these DB tables?
g
@fierce-monkey-46092 you can use the datahub lineage file - see these docs for details https://datahubproject.io/docs/generated/ingestion/sources/file-based-lineage/#module-datahub-lineage-file
w
@gray-shoe-75895 - looks like we're in the same timezone ... I did run an additional modified run afterwards so hopefully any logs you're looking for are still visible. I'll be free b/w 10:30-noon. Any preferred time?
g
@wonderful-notebook-20086 - I DM’d you
@fierce-monkey-46092 - some of our sources (e.g. snowflake, redshift, etc) support automatic lineage, but unfortunately oracle is not one of them
f
@gray-shoe-75895 I see. So since my database is Oracle, there is no auto-lineage in Datahub right?
g
That’s correct
f
I see. So then if i have to make it with my hand (file-based lineage) how to create a big lineage? Could you help me on this?
g
What do you mean by “create a big lineage”? I’d recommend starting with the docs here https://datahubproject.io/docs/generated/ingestion/sources/file-based-lineage/#module-datahub-lineage-file
f
@gray-shoe-75895 Hi. what i mean with "create a big lineage" is that i wanna build many upstreams and downstreams. not only 1 downsteam that has few upstreams. In other word i wanna create a lineage that shows Grandpa, Dad, Child, and more and more with file-based lineage. Could you help/
g
That is possible with the file-based lineage. For example, you can set the grandparent node as the upstream of the parent, and the parent as the upstream of the child - that way you can configure longer lineage relationships