Hi, new to DataHub and excited to play around with...
# integrate-tableau-datahub
a
Hi, new to DataHub and excited to play around with it! We use Snowflake + Tableau and I was curious if there's a way for DataHub to extract the dashboard lineage in a way that can be connected back to Snowflake ingestion (if that makes sense). Tableau ingestion stores embedded Snowflake datasource names in the form of
database.schema.table_name
. Separately Snowflake ingestion has a hierarchal structure of
database > schema > table_name
. My assumption is that connecting the lineage between something like
Tableau Dataset: mydb.myschema.mytable
and
Snowflake Dataset: mydb > myschema > mytable
would not show up in lineage graphs automatically. For Tableau ingestion, I did see a section for
default_schema_map
in the YAML settings. Don't know if something would need to change there to make the connection between a scenario like this work.
Copy code
source:
    type: tableau
    config:
        ingest_owner: true
        default_schema_map:
            mydatabase: public
            anotherdatabase: anotherschema
        connect_uri: '<https://tableau.site.com>'
        password: '${tableau_password}'
        ingest_tags: true
        username: tableau_username
        projects: null
pipeline_name: 'blah_blah'
m
Have you tried to ingest tableau? Lineage showed up fine for me between tableau and various databases including snowflake
You might want to include
ingest_tables_external: true
a
In the UI, that option wasn’t available, so I didn’t know it could be added to the YAML properties. By default, the ingestion does provide lineage for charts > dashboards > datasources (either SQL tables or queries). So the extra setting provides more metadata for upstream lineage?
m
Yes
You should be able to get all the lineage.
a
Ok so from what I can tell, Tableau ingests everything it can find lineage wise. If an upstream node (say a Snowflake table) hasn't been ingested already outside of the context of Tableau, it provides a default metadata title (
database.schema.table_name
). I assume this default title is given until the source is properly ingested through Snowflake? Once I ingested the Snowflake data itself (outside of Tableau's ingestion), the node showed up properly in the form I would expect (directory style
database > schema > table_name
)
m
That is correct