Eka Pramudita
07/16/2024, 7:57 AMkapa.ai
07/16/2024, 7:57 AMairbyte_extracted_at
at a daily granularity and clustered by airbyte_extracted_at
and the table Primary Keys. However, it's important to note that the key field (id
in this case) should be listed first along with _airbyte_extracted_at
for efficient deduplication during incremental syncs. Removing it could significantly increase the costs of the merge queries. Here's an excerpt from the documentation: The output tables in BigQuery are partitioned by the Time-unit column `airbyte_extracted_at` at a daily granularity and clustered by `airbyte_extracted_at` and the table Primary Keys. Partitions boundaries are based on UTC time. This is useful to limit the number of partitions scanned when querying these partitioned tables, by using a predicate filter (a `WHERE` clause). Filters on the partitioning column are used to prune the partitions and reduce the query cost. (The parameter **Require partition filter** is not enabled by Airbyte, but you may toggle it by updating the produced tables.)
Please note that these tables are generally optimized for write-performance and not so much read performance. For read-optimized versions of the data, it's recommended to use a modeling tool like dbt or Dataform.Eka Pramudita
07/16/2024, 7:59 AMkapa.ai
07/16/2024, 7:59 AMkapa.ai
07/16/2024, 7:59 AM