Any reason why destination-databricks is writing w...
# ask-ai
d
Any reason why destination-databricks is writing with 24MB batch sizes?
k
The batch size of 24MB for the
destination-databricks
connector is likely related to the memory requirements and performance considerations of the connector. According to the [Databricks Lakehouse destination documentation](https://docs.airbyte.com/integrations/destinations/databricks#overview), the connector currently requires 30+MB of memory for each stream. This memory requirement can lead to performance bottlenecks, especially when syncing multiple streams, potentially causing out-of-memory errors if the allocated memory is too small. The documentation mentions that this performance bottleneck is tracked in [this issue](https://github.com/airbytehq/airbyte/issues/11424). Once resolved, the connector should be able to sync an almost infinite number of streams with less than 500MB of memory. Here are some relevant excerpts from the documentation: > Currently, this connector requires 30+MB of memory for each stream. When syncing multiple streams, it may run into an out-of-memory error if the allocated memory is too small. This performance bottleneck is tracked in [this issue](https://github.com/airbytehq/airbyte/issues/11424). Once this issue is resolved, the connector should be able to sync an almost infinite number of streams with less than 500MB of memory. Additionally, the logs from the GitHub issue [#28502](https://github.com/airbytehq/airbyte/issues/28502) show consistent batch sizes of 24MB being used during the flushing process:
Copy code
2023-07-20 01:19:01 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):85 Flushing chicago_food_inspections: 3650 records (24 MB) ... 2023-07-20 01:19:06 destination > INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAllBuffers):85 Flushing chicago_food_inspections: 3585 records (24 MB)
This indicates that the batch size is set to 24MB, likely to manage memory usage effectively and avoid out-of-memory errors during the sync process. For more details, you can refer to the [Databricks Lakehouse destination documentation](https://docs.airbyte.com/integrations/destinations/databricks#overview) and the related [GitHub issue](https://github.com/airbytehq/airbyte/issues/11424).
d
where would i ensure that I i have configured the connector for 30MB+ for each stream?
@kapa.ai where would i ensure that i have configured the connector for 30MB for each stream?
k
Hmm, I don't know enough to give you a confident answer yet. However, you might find the information you need in the [Airbyte documentation](https://docs.airbyte.com/). Specifically, you can look into the configuration settings for your specific connector to see if there are options to set buffer sizes or other performance-related parameters.