Any way to ingest data from Tableau, Metabase and ...
# ingestion
p
Any way to ingest data from Tableau, Metabase and Redash and bring in the lineage? I would love to collaborate, but will need pointers to start with…thanks
h
Charts and dashboards and tracking their relationship to upstream datasets is supported, but the ingestion from those products that you mention is not at the moment. The closest thing to an example of implementong that support would be to check how the Looker ingestion is implemented. The details are obviously going to be different because the APIs are different but the datahub-specific parts are going to be very similar
p
okay, makes sense…let me take a look at looker implementation
l
@gray-shoe-75895 ^ can you comment on the ideas around deriving lineage automatically using airflow operator instrumentation
@powerful-telephone-71997 would be great to understand if you are using airflow to load into all those dashboarding tools
g
So a couple ideas re. lineage: (1) extracting lineage from the query logs - this would be an extension of the stuff that was announced at the town hall around bigquery and snowflake usage data, where we should be able to get table-level lineage for certain types of queries and (2) similar to our LookML integration, using basic sql parsing to determine lineage. Both of these would be applicable for tableau/metabase/redash - I haven’t done too much digging here, but I know metabase automatically prepends a small header to its sql queries that make them easy to identify in the query logs
If you’re using airflow to orchestrate any of these processes, you can also just use the inlets/outlets interface as part of Datahub’s Airflow lineage backend. We’re also thinking about adding wrappers around common operator types to automatically extract this lineage information
p
@loud-island-88694 - Many of these dashboards are created by Analysts in the past, and we have just introduced Airflow to most of them, so adoption again of Airflow is going to take some time…but thats the ultimate vision to schedule where possible only with Airflow…
@gray-shoe-75895 our team did read a medium post around datahub <-> Airflow integrations. If you have some articles/ blogs on how to, please share them with me…thank you
That will be really helpful
@gray-shoe-75895 for (1) we might have to do this from redshift logs (kind of reverse engineering) On 2, any pointers / help would be great
g
@powerful-telephone-71997 you can take a look the datahub <-> airflow docs and examples https://datahubproject.io/docs/metadata-ingestion/#lineage-with-airflow, and also read @green-football-43791’s writeup about lineage in datahub more broadly https://medium.com/datahub-project/data-in-context-lineage-explorer-in-datahub-a53a9a476dc4. The lookml sql parsing is quite simple, and you can take a look at the code here: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/lookml.py#L199-L216
s
Hello @powerful-telephone-71997 , Hope you are doing well. I would like to ask if you were able to solve the tableau connection . I am interested in a similar use case
p
Hi @some-microphone-33485 due to other priorities, havent done this, most likely next week
If you end up finishing it earlier - please share your wisdom 🙂