<@U04SE163WC8> I’m setting up Kubernetes deploymen...
# ask-ai
s
@kapa.ai I’m setting up Kubernetes deployment using Airbyte Helm charts. When customising the values.yaml file, what is the difference between jobs and workers? I can define resource requests in: global.jobs,resources and worker.resources, what’s the difference between these two? Which is is more likely to becomes a bottle neck?
u
In Airbyte, all interactions with connectors are run as jobs performed by a Worker. Each job has a corresponding worker, and there are generally four types of workers: Spec worker, Check connection worker, Discovery worker, and Sync worker. Workers are responsible for executing Airbyte operations (e.g. Discover, Read, Sync etc), and are created on demand whenever these operations are requested. Every job has a corresponding worker executing its work. In the Kubernetes deployment, an Airbyte worker will create at least one Kubernetes pod. The created Kubernetes pod does all the actual work[^1^][^5^]. In your
values.yaml
file,
global.jobs.resources
and
worker.resources
are used to define the resource requests for jobs and workers respectively. The
global.jobs.resources
configuration applies to all containers in a Sync[^9^]. On the other hand,
worker.resources
likely applies to the resources allocated to the worker pods in Kubernetes. The bottleneck between jobs and workers depends on your specific use case. However, since workers are responsible for initiating jobs and monitoring their state, having insufficient worker resources could potentially slow down the initiation and monitoring of jobs. On the other hand, jobs do the actual heavy lifting of data synchronization, so having insufficient resources for jobs could slow down the data synchronization process[^1^]. In general, it is recommended to start out with a mid-sized cloud instance (e.g. 4 or 8 cores) and gradually tune the instance size to your workload[^1^]. You can also configure the maximum number of concurrent jobs via
MAX_CHECK_WORKERS
and
MAX_SYNC_WORKERS
environment variables[^4^]. [^1^]: (source: https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale) [^4^]: (source: https://docs.airbyte.com/understanding-airbyte/jobs#workloads) [^5^]: (source: https://docs.airbyte.com/understanding-airbyte/jobs) [^9^]: (source: https://docs.airbyte.com/operator-guides/configuring-connector-resources)
s
@kapa.ai I’ve seen in the documentation that I should set these ENV variables to allocate more memory to the workers: JOB_MAIN_CONTAINER_MEMORY_REQUEST and JOB_MAIN_CONTAINER_MEMORY_LIMIT which place in the values.yaml I should put these?
u
A message from kapa.ai
u
In the Helm chart for Airbyte, environment variables can be set in the
values.yaml
file under the
env
field. Here is an example of how you can set the
JOB_MAIN_CONTAINER_MEMORY_REQUEST
and
JOB_MAIN_CONTAINER_MEMORY_LIMIT
environment variables:
Copy code
yaml env:   JOB_MAIN_CONTAINER_MEMORY_REQUEST: "2Gi"   JOB_MAIN_CONTAINER_MEMORY_LIMIT: "4Gi"
This will set the minimum memory request for the job's main container to 2Gi and the maximum memory limit to 4Gi. Adjust these values according to your needs and the resources available in your Kubernetes cluster. Please note that these environment variables are used to customize the memory usage of the Source worker, which reads up to 10,000 records in memory. If you're dealing with large row sizes, you might need to increase these values to prevent out-of-memory exceptions[^3^]. [^3^]: (source: https://docs.airbyte.com/operator-guides/scaling-airbyte#memory)
s
@kapa.ai I can see the values.yaml file has support for global.env_vars so I should put those there?
u
A message from kapa.ai
u
Yes, you can set the
JOB_MAIN_CONTAINER_MEMORY_REQUEST
and
JOB_MAIN_CONTAINER_MEMORY_LIMIT
environment variables in the
values.yaml
file under the
global.env_vars
section. Here is an example:
Copy code
yaml global:   env_vars:     JOB_MAIN_CONTAINER_MEMORY_REQUEST: "2Gi"     JOB_MAIN_CONTAINER_MEMORY_LIMIT: "4Gi"
This will set the minimum memory request for the job's main container to 2Gi and the maximum memory limit to 4Gi. Adjust these values according to your needs and the resources available in your Kubernetes cluster. Remember to apply the changes to your Kubernetes cluster after modifying the
values.yaml
file.