https://datahubproject.io logo
#ingestion
Title
# ingestion
b

bumpy-activity-74405

06/09/2022, 8:13 AM
Hey, I just started using bigquery/dbt/airflow so forgive me if the questions sound stupid 🙂 I’ve somewhat successfully ingested metadata/usage/stats of bigquery tables but it also had lineage (from bigquery logs, I assume). I initially thought that lineage would have to be ingested from dbt/airflow sources. Is there any reason I would look into those sources? Any pros/cons of getting lineage from bq source vs dbt/airflow?
l

loud-island-88694

06/09/2022, 4:22 PM
@bumpy-activity-74405 @green-football-43791 is working on collapsing dbt and bigquery nodes into a single logical node on the lineage UI. There are certainly tables managed outside of dbt in BigQuery (e.g: transformation on source tables) - so it is good to have lineage from both. As for Airflow, we see that more as being useful for transit points (e.g: for loading data into BQ from an external system or moving data from BQ into another system) so that you get full end-to-end visibility
👍 1
b

bumpy-activity-74405

06/10/2022, 6:19 AM
I agree with your point about having both bq and dbt sources for lineage, but: I tried playing around with the jaffle_shop models and bq/dbt ingestion. What I’ve noticed is the two sources overwrite each other’s lineage in some cases.
This is lineage after ingesting dbt source:
This is the same lineage when you ingest bq source afterwards:
Also I was looking at your demo and how tests are shown in the UI. Currently I can see they are only shown as datasets (example). Is there a way to see whether it failed or passed? Is this the only way to show tests right now or am I just not seeing something?
l

loud-island-88694

06/10/2022, 2:24 PM
The lineage overwrite will get resolved through the merged logical entity (@green-football-43791 can confirm). As for dbt tests, @mammoth-bear-12532 is working on them showing up as assertions(similar to great expectations) in the validations tab instead of representing them as datasets
b

bumpy-activity-74405

06/10/2022, 2:40 PM
Wow you guys are reading my thoughts before I even think them 😅
g

green-football-43791

06/10/2022, 2:58 PM
Yes, exactly! The idea is to take bigquery lineage, dbt lineage, and combine them to provide a consistent graph
all this feedback is very helpful Karolis, it helps confirm we’re on the right track 🙂
👍 1
b

bumpy-activity-74405

06/10/2022, 3:08 PM
One more question - say I have multiple dbt projects with models that have the same table as a source. This would probably mess up (overwrite) upstream lineage, right? Any ideas on how one would solve this outside of merging the manifest files from the different projects?
Just realised that upstream lineage belongs to a model and it has nothing to do with the source table so this should probably work fine
g

green-football-43791

06/10/2022, 3:19 PM
In this case we actually recommend you use the
platform_instance
property
to differentiate your dbt projects
there is a bug in the latest release with this however
so please hold off until the next release!
should be out today