Jian Zhu
05/27/2024, 10:04 PMkapa.ai
05/27/2024, 10:04 PMMAX_SPEC_WORKERS
, MAX_CHECK_WORKERS
, MAX_DISCOVERY_WORKERS
, and MAX_SYNC_WORKERS
. The default value for these variables is set to 5. 2. Configuration Parameters: - MAX_*_WORKERS
variables define the maximum number of each type of worker that each Airbyte Worker container can support. For example: yaml MAX_SPEC_WORKERS: 5 MAX_CHECK_WORKERS: 5 MAX_DISCOVERY_WORKERS: 5 MAX_SYNC_WORKERS: 5
- SUBMITTER_NUM_THREADS
is another configuration parameter that defines the maximum number of concurrent jobs the Scheduler can schedule. 3. Kubernetes Deployment: - In a Kubernetes deployment, a single worker is limited to 40 parallel jobs based on the TEMPORAL_WORKER_PORTS
. If you need to run more jobs, you can deploy multiple workers. For example, to run 100 sync jobs, you might configure: yaml Number of Workers: 3 SUBMITTER_NUM_THREADS: 200 MAX_SYNC_WORKERS: 100
4. Potential Issues: - Be aware that even if there are different queues, the queues are consumed by the same pool of workers, which means operations can affect each other despite being on different queues. For more detailed information, you can refer to the following documentation and discussions: - [Configuring Jobs & Workers](https://docs.airbyte.com/understanding-airbyte/jobs#configuring-jobs--workers) - [Configuring Airbyte](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker) - [Scaling Airbyte on Kubernetes](https://docs.airbyte.com/deploying-airbyte/on-kubernetes/#increasing-job-parallelism) By properly configuring these parameters, you can effectively run multiple workers on the same machine and manage job concurrency in Airbyte.Jian Zhu
05/27/2024, 10:06 PMkapa.ai
05/27/2024, 10:06 PMMAX_SPEC_WORKERS
- MAX_CHECK_WORKERS
- MAX_SYNC_WORKERS
- MAX_DISCOVER_WORKERS
- These parameters define the maximum number of each type of worker that can run in parallel within a single Airbyte Worker container. The default value for each is 5. [Source](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker) 2. Horizontal Scaling: - If you need to handle more load than a single worker container can manage, you can horizontally scale by running multiple worker containers. This approach is particularly useful if you are deploying Airbyte on a platform that supports container orchestration, such as Kubernetes. - Airbyte supports multiple workers on Kubernetes deployment, which can help distribute the load and avoid potential issues with multiple workers writing to the same database backend. [Source](https://github.com/airbytehq/airbyte/discussions/28588) 3. Platform Considerations: - If you are using Docker for deployment, you can scale the number of worker containers by adjusting your Docker Compose configuration or using a container orchestration tool. - Note that AWS ECS is not officially supported by Airbyte, so you may encounter challenges if you choose to deploy on ECS. [Source](https://github.com/airbytehq/airbyte/discussions/31665) 4. Concurrency Configuration: - The SUBMITTER_NUM_THREADS
parameter controls the maximum number of concurrent jobs the Scheduler can schedule. This parameter works in conjunction with the MAX_*_WORKERS
parameters to manage job concurrency and worker parallelization. [Source](https://discuss.airbyte.io/t/1164) In summary, running multiple worker containers can help you handle increased load and improve the efficiency of your data integration tasks. Ensure that you configure the MAX_*_WORKERS
and SUBMITTER_NUM_THREADS
parameters appropriately to manage concurrency and parallelization effectively. For more detailed information, you can refer to the [Airbyte documentation on configuring workers](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker).