<@U04SE163WC8> is it worth it to deploy in kuberne...
# ask-ai
h
@kapa.ai is it worth it to deploy in kubernetes (eks aws) or scale vertically using ec2 is enough?
u
The choice between deploying Airbyte on Kubernetes (EKS AWS) or scaling vertically using EC2 depends on your specific use case and requirements. If you are looking for a production-ready deployment of Airbyte Self-Managed Enterprise, it is recommended to deploy to Amazon EKS or Google Kubernetes Engine. This is because Kubernetes allows for better performance and scalability. For instance, Airbyte workers, which do all the heavy lifting within Airbyte, can create Kubernetes pods that do all the actual work. This makes scaling Airbyte a matter of ensuring that the Kubernetes Pod running the jobs has sufficient resources to execute its work. In general, starting out with a mid-sized cloud instance (e.g. 4 or 8 cores) and gradually tuning instance size to your workload is recommended [[source](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale)]. On the other hand, if you are more familiar with EC2 from an infrastructure standpoint, you might find it easier to scale vertically using EC2. However, it's worth noting that while you can increase job parallelism on EC2, from the metrics it looks like > 90% of memory and CPU of the instance is unused [[source](https://discuss.airbyte.io/t/1547)]. In conclusion, if you are looking for better performance and scalability, deploying on Kubernetes (EKS AWS) would be a better choice. However, if you are more comfortable with EC2 and your workload is not too heavy, scaling vertically using EC2 could be sufficient.
h
@kapa.ai say we deploy with 4 cores in ec2, what is the maximum load the airbyte can handle. Can you give me numbers for reference?
u
A message from kapa.ai
u
The maximum load that Airbyte can handle on an EC2 instance with 4 cores depends on several factors, including the number of concurrent syncs, the number of connections, and the specific resource requirements of the connectors you're using. As a reference point, the typical Airbyte user has 5 - 20 connectors and 10 - 100 connections configured. Almost all of these connections are scheduled, either hourly or daily, resulting in at most 100 concurrent jobs [[source](https://docs.airbyte.com/operator-guides/scaling-airbyte)]. However, the primary driver of increased resource usage in Airbyte is the number of concurrent syncs running at any given time. Each concurrent sync requires at least 3 additional connector pods to be running at once (
orchestrator
,
read
,
write
). For example, 10 concurrent syncs require 30 additional pods in your namespace [[source](https://docs.airbyte.com/enterprise-setup/scaling-airbyte#concurrent-syncs)]. In terms of worker parallelization, Airbyte exposes environment variables to change the maximum number of each type of worker allowed to run in parallel. By default, the maximum number of Non-Sync workers allowed to run in parallel is 5, and the maximum number of Sync workers allowed to run in parallel is 10 [[source](https://docs.airbyte.com/understanding-airbyte/jobs#configuring-jobs--workers)]. Please note that these are general guidelines and the actual load that your Airbyte instance can handle may vary based on your specific use case and the resource requirements of your connectors. It's recommended to monitor your resource usage and adjust your instance size and worker configuration as needed.
h
@kapa.ai give me ideal load for 4 cores ec2 server
u
A message from kapa.ai
u
For an EC2 instance with 4 cores, the ideal load would depend on the specific workload and the connectors you're using. However, as a general guideline, you can consider the following: - Number of connectors: 5 - 20 - Number of connections: 10 - 100 - Concurrent jobs: Up to 100 You can also configure the maximum number of workers that can run in parallel. For instance, you might set:
Copy code
MAX_SYNC_WORKERS=10 MAX_SPEC_WORKERS=5 MAX_CHECK_WORKERS=5 MAX_DISCOVER_WORKERS=5
These settings mean that you can have up to 10 sync jobs running concurrently, and up to 5 of each of the other types of jobs. Remember, these are just guidelines. The actual ideal load can vary based on factors like the specific connectors you're using and the size of the data you're syncing. It's recommended to monitor your resource usage and adjust your settings as needed [[source](https://docs.airbyte.com/operator-guides/scaling-airbyte), [source](https://docs.airbyte.com/understanding-airbyte/jobs#configuring-jobs--workers)].