Thomas Clavet
05/29/2024, 3:05 PMkapa.ai
05/29/2024, 3:05 PMvalues.yaml
file for the Helm chart includes the appropriate settings for resource requests and limits. This can help manage the resources allocated to your job pods.
yaml
global:
jobs:
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "2"
memory: "2Gi"
These settings will set the JOB_MAIN_CONTAINER_CPU_REQUEST
and JOB_MAIN_CONTAINER_MEMORY_REQUEST
environment variables on the job pods. If these values are not being picked up, you might need to restart your deployment.
Source: [GitHub Issue #35749](https://github.com/airbytehq/airbyte/issues/35749)
2. Use Cron Jobs for Scaling:
Since Airbyte currently supports scaling using environment variables, you can use cron jobs to calculate and adjust the number of workers based on the load. This approach involves creating external applications or scripts that monitor the number of running and pending jobs and adjust the worker count accordingly.
Source: [GitHub Issue #33567](https://github.com/airbytehq/airbyte/issues/33567)
3. Expose Airbyte Metrics:
To set up autoscaling based on the number of running or pending jobs, you might need to expose Airbyte metrics. This can help you monitor the job load and trigger scaling actions.
Source: [GitHub Issue #30965](https://github.com/airbytehq/airbyte/issues/30965)
4. Horizontal Pod Autoscaler (HPA):
You can configure a Horizontal Pod Autoscaler (HPA) in Kubernetes to automatically scale the number of pods based on observed CPU utilization or other select metrics. However, for more specific metrics like the number of running or pending jobs, you might need to integrate custom metrics.
Here is an example of how you might configure an HPA:
yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: airbyte-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: airbyte-worker
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
For custom metrics, you would need to set up a metrics server and configure the HPA to use those metrics.
5. Graceful Scaling:
Ensure that your scaling actions are graceful to avoid disruptions. This might involve setting up preStop hooks or other mechanisms to ensure that jobs are not interrupted during scaling events.
Source: [GitHub Issue #10748](https://github.com/airbytehq/airbyte/issues/10748)kapa.ai
05/29/2024, 3:05 PMvalues.yaml
file for the Helm chart includes the appropriate settings for resource requests and limits. This can help manage the resources allocated to your job pods. yaml global: jobs: resources: requests: cpu: "1" memory: "1Gi" limits: cpu: "2" memory: "2Gi"
These settings will set the JOB_MAIN_CONTAINER_CPU_REQUEST
and JOB_MAIN_CONTAINER_MEMORY_REQUEST
environment variables on the job pods. If these values are not being picked up, you might need to restart your deployment. Source: [GitHub Issue #35749](https://github.com/airbytehq/airbyte/issues/35749) 2. Use Cron Jobs for Scaling: Since Airbyte currently supports scaling using environment variables, you can use cron jobs to calculate and adjust the number of workers based on the load. This approach involves creating external applications or scripts that monitor the number of running and pending jobs and adjust the worker count accordingly. Source: [GitHub Issue #33567](https://github.com/airbytehq/airbyte/issues/33567) 3. Expose Airbyte Metrics: To set up autoscaling based on the number of running or pending jobs, you might need to expose Airbyte metrics. This can help you monitor the job load and trigger scaling actions. Source: [GitHub Issue #30965](https://github.com/airbytehq/airbyte/issues/30965) 4. Horizontal Pod Autoscaler (HPA): You can configure a Horizontal Pod Autoscaler (HPA) in Kubernetes to automatically scale the number of pods based on observed CPU utilization or other select metrics. However, for more specific metrics like the number of running or pending jobs, you might need to integrate custom metrics. Here is an example of how you might configure an HPA: yaml apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: airbyte-worker-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: airbyte-worker minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
For custom metrics, you would need to set up a metrics server and configure the HPA to use those metrics. 5. Graceful Scaling: Ensure that your scaling actions are graceful to avoid disruptions. This might involve setting up preStop hooks or other mechanisms to ensure that jobs are not interrupted during scaling events. Source: [GitHub Issue #10748](https://github.com/airbytehq/airbyte/issues/10748) By following these steps, you can set up autoscaling for your Airbyte workers in an EKS cluster based on the load. If you encounter any specific issues or need further customization, you might need to refer to the Airbyte documentation or seek additional support from the community.Thomas Clavet
05/29/2024, 3:10 PMkapa.ai
05/29/2024, 3:10 PMkapa.ai
05/29/2024, 3:11 PMMAX_CHECK_WORKERS
- MAX_DISCOVER_WORKERS
- MAX_SYNC_WORKERS
- MAX_SPEC_WORKERS
Example configuration in `values.yaml`: yaml env: MAX_CHECK_WORKERS: 10 MAX_DISCOVER_WORKERS: 10 MAX_SYNC_WORKERS: 20 MAX_SPEC_WORKERS: 10
Source: [GitHub Issue #29547](https://github.com/airbytehq/airbyte/issues/29547) 3. Handling More Jobs: The container orchestration feature allows Airbyte to handle as many jobs as the resources and configurations permit. This means that with appropriate resource allocation and configuration, Airbyte can scale to handle a large number of concurrent jobs. 4. Resource Management: Ensure that your Kubernetes cluster has sufficient resources (CPU, memory) to handle the increased number of jobs. You can configure resource requests and limits in your Helm chart to manage this effectively. Example resource configuration: yaml global: jobs: resources: requests: cpu: "1" memory: "1Gi" limits: cpu: "2" memory: "2Gi"
Source: [GitHub Issue #35749](https://github.com/airbytehq/airbyte/issues/35749) By leveraging the container orchestration feature and configuring the environment variables and resources appropriately, you can scale your Airbyte deployment to handle a large number of sync jobs efficiently.