Hello, Our reporting setup is Tableau on top of Re...
# integrate-tableau-datahub
f
Hello, Our reporting setup is Tableau on top of Redshift, and we are ingesting metadata from both these systems. I noticed that empty duplicate redshift datasets got created in Datahub from the Tableau ingestion process. Because of this our lineage is fragmented. Looking for your help on the following points: • What must be causing this? • What should we do to fix it? • What needs to be done, so that this does not happen again? Example of the situation: 1. urnlidataPlatform:redshift,_redshift_database_instance.hr_schema.hr_table_,PROD 2. urnlidataPlatform:redshift,_some_other_schema_._hr_schema.hr_table_,PROD 3. urnlidataPlatform:redshift,_some_other_schema_._hr_schema.hr_table_,PROD # 1 is the correct one that got ingested from the redshift recipe. #2 & #3 are the empty ones that got created via Tableau ingestion. Please let me know if you need any further details. Thank you for your help. CC: @swift-plastic-79414
that function is the one that make the dataset urn, and sometime it’s not accurate. I ended up modifying the tableau connector and have our own.
g
Is there a generalizable way that the connector can get it right? e.g. where does the some_other_schema field come from, and how might we tell that it’s actually supposed to be under redshift_database_instance instead?
f
@gray-shoe-75895 hmm...I am not sure, my best guess is that there seems to be some issue during SQL parsing? I am assuming datahub is parsing the custom SQL to extract the entities to build the lineage. Please correct me if I am wrong.
m
Datahub doesnt parse the custom sql (at least some versions ago), it relies on the metadata returned by tableau. Sometimes the metadata returned by tableau is very messy. It would probably be better to parse the sql to extract lineage
g
@modern-artist-55754 that’s correct - the Tableau API returns results to us and we try to parse it. I suspect the issue is in this code block https://github.com/datahub-project/datahub/blob/96f782730ea33eafb58e10cf63e0fe8d45[…]8473/metadata-ingestion/src/datahub/ingestion/source/tableau.py or in the
make_table_urn
method that it calls. @full-engineer-98290 if you’d be open to another debugging session, I’d love to figure this out