Hello! Im trying to use the nodeSelector for my jo...
# ask-ai
t
Hello! Im trying to use the nodeSelector for my jobs (im deploying using Helm in kubernetes on GCP) but it simply doesnt work
k
A message from kapa.ai
To use node selectors for your jobs in Airbyte when deploying with Helm, you can set the
JOB_KUBE_NODE_SELECTORS
environment variable. This variable allows you to define one or more pod node selectors for the sync job and serves as the default pod node selectors fallback for other jobs. Here's an example of how to set the
JOB_KUBE_NODE_SELECTORS
variable in your
values.yaml
file:
Copy code
yaml global:   env:     - name: JOB_KUBE_NODE_SELECTORS       value: "key1=value1,key2=value2"
Replace
key1=value1,key2=value2
with your desired key-value pairs for node selectors. Additionally, you can set job-specific node selectors using the following environment variables: -
SPEC_JOB_KUBE_NODE_SELECTORS
-
CHECK_JOB_KUBE_NODE_SELECTORS
-
DISCOVER_JOB_KUBE_NODE_SELECTORS
For example, to set node selectors for the check job, you can add the following to your
values.yaml
file:
Copy code
yaml global:   env:     - name: CHECK_JOB_KUBE_NODE_SELECTORS       value: "key1=value1,key2=value2"
Replace
key1=value1,key2=value2
with your desired key-value pairs for node selectors. For more information, refer to the [Configuring Airbyte documentation](https://docs.airbyte.com/operator-guides/configuring-airbyte). If you're still having issues, please ensure that your Helm chart and Airbyte versions are up-to-date, as there might be bugs in older versions.
t
@kapa.ai and what about the tolerations ? how do i configurate them?
k
A message from kapa.ai
To configure tolerations in Airbyte, you can use the
JOB_KUBE_TOLERATIONS
environment variable. Tolerations are separated by ';', and each toleration contains key-value pairs mentioning some or all of the key, effect, operator, and value, separated by ','. Here's the relevant extract from the documentation: 1.
JOB_KUBE_TOLERATIONS
- Define one or more Job pod tolerations. Tolerations are separated by ';'. Each toleration contains k=v pairs mentioning some/all of key, effect, operator and value and separated by
,
. You can find more information on configuring Airbyte in Kubernetes [here](https://docs.airbyte.com/operator-guides/configuring-airbyte).
t
@kapa.ai is passing the env key inside global the correct way to configure these env variables ? theyre not to be found inside my pods
k
A message from kapa.ai
It seems that the information provided in the extracts is not sufficient to answer your question about passing the env key inside global for configuring environment variables in your Kubernetes deployment. I recommend referring to the [Airbyte documentation](https://docs.airbyte.com/) or the [Airbyte community forum](https://discuss.airbyte.io) for more information and assistance.
t
@kapa.ai in the chart, what are the deployment modes and their differences ?
k
A message from kapa.ai
Airbyte offers two deployment types, and their underlying process implementations differ accordingly: 1. Docker deployment: Each process is a local process backed by a Docker container. As all processes are local, process communication is per standard Unix pipes. 2. Kubernetes deployment: Each process is backed by a Kubernetes pod. As Kubernetes does not make process-locality guarantees, Airbyte has implemented mechanisms to hide the remote process execution. You can read more about scaling data pipelines with Kubernetes in [this blog post](https://airbyte.com/blog/scaling-data-pipelines-kubernetes). Here's the relevant extract from the documentation: > Airbyte offers two deployment types. The underlying process implementations differ accordingly. > > 1. The Docker deployment - Each process is a local process backed by a Docker container. As all processes are local, process communication is per standard unix pipes. > 2. The Kubernetes deployment - Each process is a backed by a Kubernetes pod. As Kubernetes does not make process-locality guarantees, Airbyte has implemented mechanisms to hide the remote process execution. > See [this blogpost](https://airbyte.com/blog/scaling-data-pipelines-kubernetes) for more details. Source: [Airbyte Documentation - Workers & Jobs](https://docs.airbyte.com/understanding-airbyte/jobs)
t
@kapa.ai i mean the deploymentMode key inside the helm chart for kubernetes
k
A message from kapa.ai
I apologize for the confusion earlier. Based on the information provided in the extracts, there isn't enough information about the
deploymentMode
key inside the Helm chart for Kubernetes. I recommend referring to the [Airbyte documentation](https://docs.airbyte.com/) or the [Airbyte community forum](https://discuss.airbyte.io) for more information and assistance.
t
@kapa.ai all my pods have the environment variables JOB_KUBE_TOLERATIONS and JOB_KUBE_NODE_SELECTORS created. the jobs are being created ignoring this configurations - what can be going on ?
k
A message from kapa.ai
Based on the information provided in the extracts, it's difficult to determine the exact cause of the issue you're experiencing with the environment variables
JOB_KUBE_TOLERATIONS
and
JOB_KUBE_NODE_SELECTORS
being ignored in your Kubernetes deployment. I recommend checking the following: 1. Ensure that the environment variables are correctly set in your deployment configuration. 2. Verify that there are no typos or syntax errors in your configuration files. If the issue persists, consider seeking help from the [Airbyte community forum](https://discuss.airbyte.io) or the [Airbyte Slack workspace](https://airbytehq.slack.com) for more specific guidance and troubleshooting assistance.
b
@Thiago Guimarães Did you ever get a resolution to this problem? We are experiencing a problem with the connection checker pod getting scheduled on the wrong node - a node that has no network connection to the rest of airbyte. We’ve deployed with helm. setting all the values.yml nodeSelector values.
t
hey @Brian Castelli! I actually did find an almost 100% working solution
I've found a bug on the
normalization
pods spawning only, ive even opened a discussion in airbyte's forum and a github issue
this is the issue regarding the normalization jobs (its also related to nodeSelector problems) https://github.com/airbytehq/airbyte/issues/27133
i had the same issue - all jobs (except sync ones) were running on unexpected node pools
through trial and error, ive found that setting the extraEnv on the worker pods for setting up the specific node pool for each type of job got it working (except for normalization jobs which dont have a specific environment variable for this configuration)
my current solution is: • in values.yml, set
global.jobs.nodeSelector
to your desired nodeSelector - it works for sync jobs at least (destination and source pods) • in values.yml, set
global.worker.extraEnv
to include the array of your desired env variables config per job type, here is my example:
Copy code
- name: JOB_KUBE_NODE_SELECTORS
      value: "my/selector=airbyte-jobs"
    - name: SPEC_JOB_KUBE_NODE_SELECTORS
      value: "my/selector=airbyte-jobs"
    - name: CHECK_JOB_KUBE_NODE_SELECTORS
      value: "my/selector=airbyte-jobs"
    - name: DISCOVER_JOB_KUBE_NODE_SELECTORS
      value: "my/selector=airbyte-jobs"
Im also using JOB_KUBE_TOLERATIONS, its the same idea if youre also having problems with it. To debug it i had to run jobs and run
kubectl describe
on the specific pods of the jobs that were spawning to see if the node selector and tolerations were being configured correctly
(im using k9s as my k8s UI right now - its really useful to debug this kind of stuff)
b
Super! Thank you!
I tried your method, AND IT WORKED! Thank you. @Thiago Guimarães
🙏 1
t
hahaah youre welcome @Brian Castelli! if you could leave a thumbs up on this issue here https://github.com/airbytehq/airbyte/issues/27133 so the bug in the normalization jobs would be fixed id be really glad 🙂
b
Done.
❤️ 1