I am running an ingestion from a PostgresDB which ...
# ask-ai
s
I am running an ingestion from a PostgresDB which is intended to run in CDC mode after the initial sync. This initial sync takes 8+ hours and is taking up all resources available to Airbyte and therefore other smaller ingestions are showing up as "pending" state in kubectl. I am running airbyte in a VM using abctl. What can I do to overcome this?
k
Your issue is a common challenge when running large initial syncs in Airbyte, especially with CDC sources like Postgres. The initial sync is resource-intensive and can block other jobs, causing them to remain in a "pending" state. Here are concrete steps you can take, based on Airbyte documentation and community discussions: ### 1. Scale Up Resources or Limit Concurrency - Increase VM Resources: If possible, allocate more CPU and memory to your Airbyte VM. The initial sync, especially for large tables, can consume significant resources, particularly on the source worker side, which may read up to 10,000 records into memory at once. For large row sizes, this can require substantial RAM. Consider starting with a larger instance and tuning as needed [[Scaling Airbyte](https://docs.airbyte.com/platform/1.6/operator-guides/scaling-airbyte#what-to-scale)]. - Limit Concurrent Syncs: Airbyte allows you to configure the maximum number of concurrent syncs. By reducing this number, you can prevent resource starvation and avoid jobs being stuck in "pending." In Kubernetes, this is typically set via environment variables or Helm values (e.g.,
MAX_SYNC_WORKERS
). In abctl-managed VMs, check your deployment’s configuration for similar options [[Scaling Airbyte After Installation](https://docs.airbyte.com/platform/next/enterprise-setup/scaling-airbyte)]. ### 2. Adjust Connector Resource Requests and Limits - Tune Resource Requests/Limits: You can set CPU and memory requests/limits for sync jobs at various levels (instance-wide, connector definition, connector-specific, or connection-specific). For your large initial sync, consider increasing the memory and CPU limits for the Postgres source connector or for that specific connection. This can be done by editing your
values.yaml
or via SQL updates to the Airbyte database [[Configuring Connector Resources](https://docs.airbyte.com/platform/1.6/operator-guides/configuring-connector-resources)]. ### 3. Stagger or Schedule Large Syncs - Schedule Large Syncs During Off-Hours: If possible, run the initial sync for large tables during periods of low activity to minimize impact on other jobs. - Split Large Tables: If your schema allows, consider splitting very large tables into smaller chunks or syncing them separately to reduce the load. ### 4. Monitor and Troubleshoot - Monitor Resource Usage: Use infrastructure monitoring tools to track CPU and memory usage. If you see high requested but low used resources, you may be over-provisioning requests, which can block pod scheduling [[Scaling Airbyte After Installation](https://docs.airbyte.com/platform/next/enterprise-setup/scaling-airbyte)]. - Check for Stuck Jobs: If jobs remain pending due to unavailable resources, increasing the number of nodes or reducing resource requests per job can help. ### 5. Additional Tips - Upgrade Airbyte and Connectors: Ensure you are running the latest versions, as performance improvements and bug fixes are released regularly. - Reduce Number of Tables per Connection: For CDC, limiting the number of tables per connection (e.g., to 30) can help with throughput and resource management [[Forum: Postgres CDC connector stuck](https://discuss.airbyte.io/t/4373)]. --- Summary: To overcome resource contention during large initial syncs, increase your VM resources, limit concurrent syncs, tune connector resource requests/limits, and consider scheduling or splitting large jobs. Monitoring and adjusting these parameters should help prevent smaller jobs from being blocked and improve overall system throughput. If you need specific configuration examples or run into issues with abctl, please provide more details about your deployment setup.