https://linen.dev logo
m

Mohammad Safari

10/07/2021, 11:28 AM
fyi we had a ~100X cost saving on bigquery by writing a custom DBT that only denormalizes and transfers newer data to bigquery. to achieve the best result I needed to cluster raw table based on _airbyte_emitted_at column (select * where _airbyte_emitted_at>last-time basically bills for entire table size when non-clustered). my code here. Unfortunately we cannot ship it as is because it is not backward compatible but we can probably add that as an option into current bigquery destination or as its own destination.
u

user

10/07/2021, 11:45 AM
nice!! I’m working on this soon you can expect something out in airbyte in the next weeks
u

user

10/07/2021, 11:46 AM
it’s still pretty useful to get the community try things like this and provide feedback, so when we get to it, it’s easier to spec!
m

Mohammad Safari

10/07/2021, 11:59 AM
u

user

10/07/2021, 12:19 PM
Question on your incremental strategy for BigQuery, did you tweak that on dbt side? it seems like using the
_airbyte_emitted_at
could benefit from setting on
insert_overwrite
? https://discourse.getdbt.com/t/bigquery-dbt-incremental-changes/982/8
u

user

10/07/2021, 3:13 PM
you mean using _airbyte_emitted_at as partition? that’d not work here as all decisions are based on primary key (ID in our case).
2 Views