hi < kapa ai> I m using BigQuery as the destination My quest Airbyte #ask-ai

hi <@U04SE163WC8> I'm using BigQuery as the destin...

Eka Pramudita

07/16/2024, 7:57 AM

hi @kapa.ai I'm using BigQuery as the destination. My question, is it possible to create cluster key in BigQuery Tables via AirByte?

kapa.ai

07/16/2024, 7:57 AM

Yes, it is possible to create a cluster key in BigQuery tables via Airbyte. According to the [Airbyte documentation](https://docs.airbyte.com/integrations/destinations/bigquery#final-table-schema), the output tables in BigQuery are partitioned by the Time-unit column

airbyte_extracted_at

at a daily granularity and clustered by

airbyte_extracted_at

and the table Primary Keys. However, it's important to note that the key field (

id

in this case) should be listed first along with

_airbyte_extracted_at

for efficient deduplication during incremental syncs. Removing it could significantly increase the costs of the merge queries. Here's an excerpt from the documentation:

Copy code

The output tables in BigQuery are partitioned by the Time-unit column `airbyte_extracted_at` at a daily granularity and clustered by `airbyte_extracted_at` and the table Primary Keys. Partitions boundaries are based on UTC time. This is useful to limit the number of partitions scanned when querying these partitioned tables, by using a predicate filter (a `WHERE` clause). Filters on the partitioning column are used to prune the partitions and reduce the query cost. (The parameter **Require partition filter** is not enabled by Airbyte, but you may toggle it by updating the produced tables.)

Please note that these tables are generally optimized for write-performance and not so much read performance. For read-optimized versions of the data, it's recommended to use a modeling tool like dbt or Dataform.

Eka Pramudita

07/16/2024, 7:59 AM

@kapa.ai can you give example how to optimize the read on BigQuery tables via AirByte?

kapa.ai

07/16/2024, 7:59 AM

A message from kapa.ai

kapa.ai

07/16/2024, 7:59 AM

To optimize read performance on BigQuery tables via Airbyte, you can use a data transformation tool like dbt. Airbyte has a native integration with dbt that you can use to run transformations on your data. Here's a general process you can follow: 1. Import the existing generated DBT models for your connection. 2. Tweak these models as per your requirements. 3. Declare it as a custom transformation. You can find a guide on how to do this in the [Airbyte documentation](https://docs.airbyte.com/operator-guides/transformation-and-normalization/transformations-with-dbt). Please note that optimizing read performance often involves creating new models or optimizing existing ones based on your specific use case. This could involve creating aggregated tables, optimizing joins, or creating indexes, among other things. Remember, the goal is to structure your data in a way that makes your most common queries as fast as possible. This often involves a trade-off between storage cost and query performance. If you find a way to optimize the auto-generated SQL script, you can also contribute to the Airbyte community by opening a PR on their repo. This was suggested in a [forum post](https://discuss.airbyte.io/t/1515) on the Airbyte community forum.

Open in Slack

Previous Next