can I run multiple workers on the same machine?
# ask-ai
j
can I run multiple workers on the same machine?
k
Yes, you can run multiple workers on the same machine in Airbyte. Here are some key points and configurations to consider: 1. Worker Types and Concurrency: - Airbyte supports different types of workers: Spec workers, Check connection workers, Discovery workers, and Sync workers. - You can configure the maximum number of each type of worker that can run in parallel using environment variables such as
MAX_SPEC_WORKERS
,
MAX_CHECK_WORKERS
,
MAX_DISCOVERY_WORKERS
, and
MAX_SYNC_WORKERS
. The default value for these variables is set to 5. 2. Configuration Parameters: -
MAX_*_WORKERS
variables define the maximum number of each type of worker that each Airbyte Worker container can support. For example:
Copy code
yaml      MAX_SPEC_WORKERS: 5      MAX_CHECK_WORKERS: 5      MAX_DISCOVERY_WORKERS: 5      MAX_SYNC_WORKERS: 5
-
SUBMITTER_NUM_THREADS
is another configuration parameter that defines the maximum number of concurrent jobs the Scheduler can schedule. 3. Kubernetes Deployment: - In a Kubernetes deployment, a single worker is limited to 40 parallel jobs based on the
TEMPORAL_WORKER_PORTS
. If you need to run more jobs, you can deploy multiple workers. For example, to run 100 sync jobs, you might configure:
Copy code
yaml      Number of Workers: 3      SUBMITTER_NUM_THREADS: 200      MAX_SYNC_WORKERS: 100
4. Potential Issues: - Be aware that even if there are different queues, the queues are consumed by the same pool of workers, which means operations can affect each other despite being on different queues. For more detailed information, you can refer to the following documentation and discussions: - [Configuring Jobs & Workers](https://docs.airbyte.com/understanding-airbyte/jobs#configuring-jobs--workers) - [Configuring Airbyte](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker) - [Scaling Airbyte on Kubernetes](https://docs.airbyte.com/deploying-airbyte/on-kubernetes/#increasing-job-parallelism) By properly configuring these parameters, you can effectively run multiple workers on the same machine and manage job concurrency in Airbyte.
j
@kapa.ai should I run multiple worker containers on the machine?
k
Running multiple worker containers on a machine can be beneficial for handling increased load and ensuring that your data integration tasks are processed efficiently. Here are some key points to consider based on the provided knowledge sources: 1. Worker Types and Parallelization: - Airbyte supports different types of workers: Spec workers, Check workers, Discover workers, and Sync workers. - Each Airbyte Worker container can support a maximum number of each type of worker, as defined by the following parameters: -
MAX_SPEC_WORKERS
-
MAX_CHECK_WORKERS
-
MAX_SYNC_WORKERS
-
MAX_DISCOVER_WORKERS
- These parameters define the maximum number of each type of worker that can run in parallel within a single Airbyte Worker container. The default value for each is 5. [Source](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker) 2. Horizontal Scaling: - If you need to handle more load than a single worker container can manage, you can horizontally scale by running multiple worker containers. This approach is particularly useful if you are deploying Airbyte on a platform that supports container orchestration, such as Kubernetes. - Airbyte supports multiple workers on Kubernetes deployment, which can help distribute the load and avoid potential issues with multiple workers writing to the same database backend. [Source](https://github.com/airbytehq/airbyte/discussions/28588) 3. Platform Considerations: - If you are using Docker for deployment, you can scale the number of worker containers by adjusting your Docker Compose configuration or using a container orchestration tool. - Note that AWS ECS is not officially supported by Airbyte, so you may encounter challenges if you choose to deploy on ECS. [Source](https://github.com/airbytehq/airbyte/discussions/31665) 4. Concurrency Configuration: - The
SUBMITTER_NUM_THREADS
parameter controls the maximum number of concurrent jobs the Scheduler can schedule. This parameter works in conjunction with the
MAX_*_WORKERS
parameters to manage job concurrency and worker parallelization. [Source](https://discuss.airbyte.io/t/1164) In summary, running multiple worker containers can help you handle increased load and improve the efficiency of your data integration tasks. Ensure that you configure the
MAX_*_WORKERS
and
SUBMITTER_NUM_THREADS
parameters appropriately to manage concurrency and parallelization effectively. For more detailed information, you can refer to the [Airbyte documentation on configuring workers](https://docs.airbyte.com/operator-guides/configuring-airbyte#worker).