<#C02QMLWJG12|advice-data-governance> Hello! I hav...
# advice-data-governance
b
#advice-data-governance Hello! I have a data and analytics community that uses dbT for their modeling and documentation. The data is in Snowflake. They currently feel they have all the 'cataloging' they need in dbT and not that interested in DataHub's cataloging capabilities. For those who have ingested dbT into DataHub, I would be interested to hear how your dbt teams find value with DataHub and how they use it as an enhancement to the metadata they capture in dbt and/or Snowflake. How is your dbt community using DataHub?
1
l
Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: 1️⃣ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? 2️⃣ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues?
a
people are often using dbt to log assertions in datahub and validate data!
b
Thanks @astonishing-answer-96712 this is good info if I can convince my dbt community that DataHub can offer them value above the documentation they capture in dbt. I first need to win them over that the tool adds value to them otherwise they'll continue just using dbt as their metadata source. I was hoping the community could share any selling points on how their dbt community is using DataHub in addition to dbt.
m
Dbt will not give you column level lineage... Dbt will not give you lineage outside of the dbt project. If you have multi dbt project or upstream system + BI layer, dbt wont give you any of that. For small usecases, it is probably ok with dbt, but if you are building a metadata platform, dbt capability is very limited.
b
Thanks @modern-artist-55754 for sharing
e
@modern-artist-55754 DataHub doesn't provide column level lineage for dbt either - have you found a way around this?
a
@hundreds-photographer-13496 might be able to provide some insight here- I imagine there’s some work with transformers or an API/SDK to get the desired effect
m
@early-hydrogen-27542 i have not tried anyway to do that. But this would involve sql parsing i suppose. Like @astonishing-answer-96712 mentioned, you can create a transformer, then parse the sql and create FinedGraineLineage on the columns. There are certain case it would not work for example incremental table in dbt. Dbt use temp table for incremental model, so you will lose the visibility on the correct upstream tables. If you use snowflake + dbt you should have column level lineage out of the box i think
a
As @modern-artist-55754 said, if you use snowflake+dbt, column-level lineage does work out of the box, as snowflake provides column level lineage for enterprise edition and above. If not using snowflake, one can always write custom transformers/ python script to process table or view's DDL and extract such lineage, however that would involve sql parsing and some challenges Steve highlighted earlier.
h
Related to dbt, there is this upcoming feature of downstream impact analysis github action you might be interested in - in today's townhall - https://datahubspace.slack.com/archives/CUMV92XRQ/p1682090187337049