Hello, I am running airbyte installed on an EC2 an...
# ask-community-for-troubleshooting
h
Hello, I am running airbyte installed on an EC2 and trying to migrate data from MongoDB -> Redshift, sometimes the sync jobs works, sometimes it hangs. I maxxed the EC2 out to 1TB of storage with 16vCPUs and 32GB RAM. I started loading, but it loaded 1k records and now stopped again for 30-40minutes, the CPU is constantly at around 2% so it seems idle. I am not quite sure what is going on, can I force the kuber cluster inside the container to utilise more resources. For example out of 6 syncs (which all were considered successful), only loaded and moved 450M records, the others loaded and moved 0 after 3-4 hours. I am sure that new data was added to those records.
k
Based on your description, it sounds like your Airbyte sync jobs are intermittently hanging or stalling, despite having ample CPU and memory resources available on your EC2 instance. Here are some steps and considerations based on the official Airbyte documentation: ### 1. Resource Allocation and Configuration Even if your EC2 instance has significant resources, Airbyte jobs (especially when running in Docker or Kubernetes) may not automatically utilize all available resources unless explicitly configured. You can control the CPU and memory allocated to Airbyte job containers by setting environment variables or updating your
values.yaml
file (if using Helm/Kubernetes):
Copy code
yaml JOB_MAIN_CONTAINER_CPU_REQUEST: <value> JOB_MAIN_CONTAINER_CPU_LIMIT: <value> JOB_MAIN_CONTAINER_MEMORY_REQUEST: <value> JOB_MAIN_CONTAINER_MEMORY_LIMIT: <value>
These settings ensure that each sync job container can request and use more CPU and memory, which may help if resource limits are the bottleneck. For Kubernetes deployments, you can also adjust pod resource requests and limits in the
values.yaml
file under the
jobs.resources
section:
Copy code
yaml global:   jobs:     resources:       limits:         cpu: 2         memory: 4096Mi       requests:         cpu: 2         memory: 4096Mi
Make sure these values are set high enough to utilize your EC2's capacity, but not so high that pods can't be scheduled due to cluster constraints. If you are running Airbyte in Docker, set the environment variables accordingly when starting the containers. [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources) ### 2. Investigate Bottlenecks - Low CPU Usage: If CPU usage is low and jobs are still hanging, the bottleneck may not be resource-related. It could be due to network latency, source/destination connector limitations, or issues with the underlying data (e.g., large documents, slow queries, or rate limits). - Logs and Monitoring: Check Airbyte logs for errors or warnings during the stalled periods. This can help identify if the job is waiting on the source, destination, or experiencing internal errors. - Connector-Specific Issues: Some connectors (like MongoDB) may have their own performance characteristics or limitations. For example, the source worker reads up to 10,000 records in memory, which can cause issues if records are large. If you have large documents, you may need to increase memory allocation for the source worker specifically [What To Scale](https://docs.airbyte.com/platform/operator-guides/scaling-airbyte#what-to-scale). ### 3. Scaling and Concurrency - Concurrent Syncs: Each sync job requires multiple pods/containers. If you run many syncs at once, ensure your cluster can handle the total number of pods. The rule of thumb is to allow for
2 x <number-of-possible-concurrent-connections>
pods at once. - Worker Limits: You can control the number of concurrent syncs by setting
MAX_SYNC_WORKERS
and
MAX_CHECK_WORKERS
environment variables. If too many jobs run at once, some may be queued or starved of resources [Scaling Airbyte After Installation](https://docs.airbyte.com/platform/enterprise-setup/scaling-airbyte). ### 4. Other Considerations - Disk Space: While you have 1TB, ensure that disk I/O is not a bottleneck (e.g., EBS volume performance). - Network: If MongoDB or Redshift are remote, network latency or throughput could impact sync speed. - Connector Bugs: If only some syncs hang, there may be connector-specific bugs or issues with certain data sets. ### Summary - Explicitly set resource requests/limits for Airbyte job containers to ensure they can use your EC2's resources. - Monitor logs to identify where jobs are stalling. - Check for connector-specific limitations or bugs. - Adjust concurrency settings if running many syncs at once. If after these steps the issue persists, consider sharing detailed logs from a stalled job for further diagnosis. For more details, see the official docs on [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources) and [Scaling Airbyte](https://docs.airbyte.com/platform/operator-guides/scaling-airbyte#what-to-scale).