Hi All - has anyone had any success getting past O...
# ask-community-for-troubleshooting
t
Hi All - has anyone had any success getting past OOMKilled errors for larger tables to postgres destination? An example problem stream I have is initial load for MySQL CDC source -> Postgres destination, where the table contains 3.7m records / 5.6GB data. I have tried tuning the resources and confirmed my settings are applying, but my syncs reliably fail whenever i’m trying to sync a table where the entire dataset does not fit into memory, logs show that it is getting past fetching data and the “typing and deduping” steps, and is moving on to copying data to the destination. According to documentation, Airbyte is supposed to batch into 10000 row sets - but if my math is correct Airbyte should be batching the data in approximately 15MB per batches. This seems contrary to what im actually seeing where the only way to get a stream to succeed is to allocate at least 2x the data size in memory. Im currently running Airbyte 2.0.0 (Helm chart 2.0.18), but this behavior has been consistently a problem in older versions too. Over the past weeks I have tested pretty much every version between 1.6.2 and 2.0.0. Happy to share more info if it is helpful.
k
You're not alone in encountering OOMKilled (Out Of Memory) errors when syncing large tables to a Postgres destination with Airbyte, even when resource limits and batch sizes seem reasonable. This is a known challenge, and several users have reported similar issues across different connectors and deployment setups. ### Key Points from the Knowledge Sources 1. Postgres as a Destination Has Limitations for Large Data Volumes - The Airbyte documentation explicitly states that Postgres is not designed to be a data warehouse and is likely to perform poorly with large data volumes (e.g., >10GB). Even with increased memory and compute, large syncs can cause the destination to "lock up" or incur high memory usage, especially when using typing and deduplication features. The documentation recommends using a data warehouse like BigQuery, Snowflake, or Redshift for larger datasets. Also, scaling IOPS (disk throughput) is as important as memory and CPU for Postgres performance during large writes or updates [Postgres Troubleshooting](https://docs.airbyte.com/integrations/destinations/postgres/postgres-troubleshooting#connector-limitations). 2. Batching and Memory Usage - Airbyte's default behavior is to batch records in sets of 10,000, but this is based on record count, not byte size. If your rows are large, the memory required for a batch can be much higher than expected. For example, a table with an average row size of 0.5MB would require 5GB of RAM per batch. This can easily lead to OOM errors if the actual row size is underestimated or if the connector does not efficiently release memory between batches [Scaling Airbyte](https://discuss.airbyte.io/t/3401). - There is an open issue in Airbyte's GitHub repository suggesting that batching should be based on bytes read, not just record count, to avoid OOM errors for tables with large rows. However, as of the latest information, this has not been fully implemented [Batching on Bytes Issue](https://github.com/airbytehq/airbyte/issues/3439). 3. Observed Behavior: Memory Usage Exceeds Data Size - Multiple users have reported that Airbyte can use 2x or more the data size in RAM during syncs, especially for large tables. This has been observed with various connectors (e.g., CockroachDB, MySQL, Postgres) and is sometimes attributed to memory leaks or inefficient memory management in the connectors [Forum Example 1](https://discuss.airbyte.io/t/1261), [Forum Example 2](https://github.com/airbytehq/airbyte/issues/13421). 4. Workarounds and Recommendations - Some users have found that breaking up large tables into smaller chunks (e.g., syncing subsets via views or splitting tables) can help avoid OOM errors. - Upgrading to the latest connector versions may help, as some improvements have been made to dynamic batching and memory management, but fundamental limitations remain. - For very large tables, consider switching your destination to a data warehouse solution, as recommended by Airbyte documentation. ### Summary - OOM errors for large tables to Postgres are a known limitation. - Batching is by record count, not byte size, which can cause unexpectedly high memory usage. - Allocating more memory sometimes helps, but often you need 2x or more the data size in RAM. - Postgres is not recommended for large syncs; use a data warehouse if possible. - No definitive fix exists as of now, but breaking up tables or using a different destination may help. If you need to stick with Postgres, you may need to continue allocating significant memory or try to split your syncs into smaller chunks. If you can, consider moving to a data warehouse destination for large datasets. If you have specific logs or error messages, sharing them may help the community or Airbyte team provide more targeted advice.
t
@kapa.ai I don’t believe the postgres server is the problem here because the server load is low. What seems to be happening is that Airbyte itseld (replication-job) seems to be loading at least one full copy of the data into memory. This makes me suspicious that either the batching is not working, OR that memory is not being freed between batches. As I said, even at the max batch size of 10K we are looking at around 15MB batches of data. Im allocating many GBs of memory to the airbyte threads, and confirmed my setting are applying by checking logs, so this does not seem reasonable to me.
k
A message from kapa.ai
Your suspicion is well-founded and aligns with multiple reports from the Airbyte community and issue trackers. While Airbyte is designed to batch records (typically 10,000 at a time), several users have observed that, in practice, memory usage can far exceed what would be expected from the batch size alone—sometimes requiring as much as 2x the total data size in RAM for large tables. This suggests that either batching is not functioning as intended in all cases, or that memory is not being efficiently released between batches. Key points from the knowledge sources: - Batching by Record Count, Not Bytes: Airbyte batches by record count (default 10,000), not by total byte size. If your records are large, this can result in unexpectedly high memory usage. There is an open issue suggesting that batching should also consider byte size, but this is not yet implemented [Reads should batch on bytes](https://github.com/airbytehq/airbyte/issues/3439). - Memory Not Freed Between Batches: Multiple users have reported that memory usage continues to grow during syncs, even when batching is in place. This has been observed across different connectors and is sometimes attributed to memory leaks or inefficient memory management. For example, in CockroachDB and Shopify connectors, users have seen Airbyte consume all available RAM, regardless of batch size settings [Shopify connector memory build up](https://discuss.airbyte.io/t/3553), [CockroachDB OOM](https://discuss.airbyte.io/t/1261). - Resource Settings May Not Prevent OOM: Even when memory limits are set and confirmed in logs, Airbyte jobs can still be killed due to OOM errors if the actual memory usage exceeds the available resources. This is a known limitation, especially for large tables [Scaling Airbyte](https://discuss.airbyte.io/t/3401), [Default 2GB memory limit not being observed](https://discuss.airbyte.io/t/3472). Summary: Your experience matches known issues: Airbyte may load more data into memory than expected, and memory may not be released efficiently between batches. This can lead to OOM errors even when batch sizes and resource limits seem reasonable. As of now, there is no definitive fix, and this is a recognized limitation in Airbyte for large tables and large records. If you need to work around this, consider splitting your syncs into smaller tables or streams, or use a destination designed for large-scale ingestion (like a data warehouse). If you are able, monitoring and reporting specific memory usage patterns to the Airbyte team may help improve future releases. Let me know if you need more details or have specific logs to review.