< kapa ai> I m setting up Kubernetes deployment using Airbyt Airbyte #ask-ai

<@U04SE163WC8> I’m setting up Kubernetes deploymen...

Seppo Puusa

10/17/2024, 5:37 PM

@kapa.ai I’m setting up Kubernetes deployment using Airbyte Helm charts. When customising the values.yaml file, what is the difference between jobs and workers? I can define resource requests in: global.jobs,resources and worker.resources, what’s the difference between these two? Which is is more likely to becomes a bottle neck?

user

10/17/2024, 5:38 PM

In Airbyte, all interactions with connectors are run as jobs performed by a Worker. Each job has a corresponding worker, and there are generally four types of workers: Spec worker, Check connection worker, Discovery worker, and Sync worker. Workers are responsible for executing Airbyte operations (e.g. Discover, Read, Sync etc), and are created on demand whenever these operations are requested. Every job has a corresponding worker executing its work. In the Kubernetes deployment, an Airbyte worker will create at least one Kubernetes pod. The created Kubernetes pod does all the actual work[^1^][^5^]. In your

values.yaml

file,

global.jobs.resources

and

worker.resources

are used to define the resource requests for jobs and workers respectively. The

global.jobs.resources

configuration applies to all containers in a Sync[^9^]. On the other hand,

worker.resources

likely applies to the resources allocated to the worker pods in Kubernetes. The bottleneck between jobs and workers depends on your specific use case. However, since workers are responsible for initiating jobs and monitoring their state, having insufficient worker resources could potentially slow down the initiation and monitoring of jobs. On the other hand, jobs do the actual heavy lifting of data synchronization, so having insufficient resources for jobs could slow down the data synchronization process[^1^]. In general, it is recommended to start out with a mid-sized cloud instance (e.g. 4 or 8 cores) and gradually tune the instance size to your workload[^1^]. You can also configure the maximum number of concurrent jobs via

MAX_CHECK_WORKERS

and

MAX_SYNC_WORKERS

environment variables[^4^]. [^1^]: (source: https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale) [^4^]: (source: https://docs.airbyte.com/understanding-airbyte/jobs#workloads) [^5^]: (source: https://docs.airbyte.com/understanding-airbyte/jobs) [^9^]: (source: https://docs.airbyte.com/operator-guides/configuring-connector-resources)

Seppo Puusa

10/17/2024, 5:44 PM

@kapa.ai I’ve seen in the documentation that I should set these ENV variables to allocate more memory to the workers: JOB_MAIN_CONTAINER_MEMORY_REQUEST and JOB_MAIN_CONTAINER_MEMORY_LIMIT which place in the values.yaml I should put these?

user

10/17/2024, 5:44 PM

A message from kapa.ai

user

10/17/2024, 5:44 PM

In the Helm chart for Airbyte, environment variables can be set in the

values.yaml

file under the

env

field. Here is an example of how you can set the

JOB_MAIN_CONTAINER_MEMORY_REQUEST

and

JOB_MAIN_CONTAINER_MEMORY_LIMIT

environment variables:

Copy code

yaml env:   JOB_MAIN_CONTAINER_MEMORY_REQUEST: "2Gi"   JOB_MAIN_CONTAINER_MEMORY_LIMIT: "4Gi"

This will set the minimum memory request for the job's main container to 2Gi and the maximum memory limit to 4Gi. Adjust these values according to your needs and the resources available in your Kubernetes cluster. Please note that these environment variables are used to customize the memory usage of the Source worker, which reads up to 10,000 records in memory. If you're dealing with large row sizes, you might need to increase these values to prevent out-of-memory exceptions[^3^]. [^3^]: (source: https://docs.airbyte.com/operator-guides/scaling-airbyte#memory)

Seppo Puusa

10/17/2024, 5:47 PM

@kapa.ai I can see the values.yaml file has support for global.env_vars so I should put those there?

user

10/17/2024, 5:47 PM

A message from kapa.ai

user

10/17/2024, 5:48 PM

Yes, you can set the

JOB_MAIN_CONTAINER_MEMORY_REQUEST

and

JOB_MAIN_CONTAINER_MEMORY_LIMIT

environment variables in the

values.yaml

file under the

global.env_vars

section. Here is an example:

Copy code

yaml global:   env_vars:     JOB_MAIN_CONTAINER_MEMORY_REQUEST: "2Gi"     JOB_MAIN_CONTAINER_MEMORY_LIMIT: "4Gi"

values.yaml

file.

Open in Slack

Previous Next