I have deployed airbyte OSS on kubernetes using th...
# ask-ai
t
I have deployed airbyte OSS on kubernetes using the official helm chart. destination-postgres-check and source-postgres-check pods do not honor the nodeselector config mentioned in the global variables in values.yml. is there some other property that can be used to manage consistent behavior of these 2 pods.
k
It appears that you are encountering an issue where the
destination-postgres-check
and
source-postgres-check
pods do not honor the
nodeSelector
configuration specified in the
values.yaml
file. This is a known issue with the Airbyte Helm chart deployment. Based on the information from the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133), the
nodeSelectors
configured in the Helm chart
values.yaml
for the jobs are not being respected for all jobs, only for sync jobs. This issue has been reported by multiple users. Here are some steps and considerations to address this issue: 1. Environment Variables for Jobs: Ensure that you have set up the environment variables for the jobs correctly. According to the [forum post](https://discuss.airbyte.io/t/4533), setting environment variables for the jobs can help in some cases. However, it seems that this might not be sufficient for all job types. 2. Customizing
values.yaml
: Make sure your
values.yaml
file is correctly configured. You can refer to the example
values.yaml
file in the [Airbyte repository](https://github.com/airbytehq/airbyte-platform/blob/main/charts/airbyte/values.yaml) for guidance. 3. Fallback Mechanism: The fallback mechanism for
nodeSelectors
might not be working as expected. This is highlighted in the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133). The normalization jobs, for example, do not have a specific environment variable to set the
nodeSelector
. 4. Using Open Policy Agent (OPA): As a temporary workaround, you can use tools like [Open Policy Agent (OPA)](https://github.com/open-policy-agent/gatekeeper) to enforce policies ensuring that the pods run on the correct nodes. This approach has been mentioned by users facing similar issues. 5. Reviewing Code References: If you have the capability, you might want to review the code references mentioned in the GitHub issue to understand where the fallback might be failing. However, this requires familiarity with the Airbyte codebase. Here is a snippet from the
values.yaml
file for setting up `nodeSelectors`:
Copy code
yaml jobs:   kube:     nodeSelectors:       key: "value"
After making the necessary changes, you can upgrade your Helm release:
Copy code
bash helm upgrade --install %RELEASE_NAME% airbyte/airbyte -n <NAMESPACE> --values /path/to/values.yaml --version 0.50.13
For more detailed information, you can refer to the [Airbyte documentation](https://docs.airbyte.com/deploying-airbyte/on-kubernetes-via-helm#overview) and the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133). If the issue persists, you might want to follow up on the GitHub issue or the forum post for any updates or potential fixes from the Airbyte team.
t
@kapa.ai what do these pods do? are they resource intensive?
k
Based on the information provided, here is a breakdown of what each pod does in an Airbyte Kubernetes deployment: 1. db: This pod runs a PostgreSQL database which is used by Airbyte to store metadata and configuration data. 2. minio: This pod is used for log storage. MinIO is an object storage server that Airbyte uses to store logs. 3. pod-sweeper: This pod is responsible for clearing successful or error pods once they are done. It helps in managing and cleaning up resources. 4. server: This pod holds all the API endpoints for Airbyte. It is the main server that handles requests and responses. 5. temporal: This pod is used for orchestration. Temporal is a workflow orchestration engine that Airbyte uses to manage job execution. 6. webapp: This pod serves the Airbyte UI, allowing users to interact with Airbyte through a web interface. 7. worker: This pod is responsible for creating and managing sync pods. Workers execute Airbyte operations such as Discover, Read, and Sync. Regarding resource intensity, the resource requirements for these pods can vary: - Workers: These are generally the most resource-intensive as they handle the actual data synchronization tasks. Each sync job creates a worker pod, and these pods need sufficient CPU and memory to perform their tasks efficiently. You can configure their resource requirements using environment variables like
JOB_MAIN_CONTAINER_CPU_REQUEST
,
JOB_MAIN_CONTAINER_CPU_LIMIT
,
JOB_MAIN_CONTAINER_MEMORY_REQUEST
, and
JOB_MAIN_CONTAINER_MEMORY_LIMIT
([source](https://discuss.airbyte.io/t/3775)). - Server, Temporal, and Webapp: These pods are less resource-intensive compared to worker pods but still require adequate resources to handle API requests, orchestrate workflows, and serve the UI respectively. - db and minio: These pods also need resources to handle database operations and log storage but are typically not as resource-intensive as worker pods. - Pod-sweeper: This pod is generally lightweight as its main task is to clean up completed or failed pods. For scaling and ensuring that the pods have sufficient resources, it is recommended to start with a mid-sized cloud instance (e.g., 4 or 8 cores) and adjust based on the workload ([source](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale)). If you need more detailed information on resource usage and scaling, you can refer to the [Scaling Airbyte documentation](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale).
t
@kapa.ai i meant destination-postgres-check and source-postgres-check pods
k
A message from kapa.ai
The
destination-postgres-check
and
source-postgres-check
pods are used in Airbyte to verify the connectivity and configuration of the respective connectors before running a sync job. Here’s a detailed explanation of what these pods do and their resource requirements: ###
destination-postgres-check
Pod This pod is responsible for checking the configuration and connectivity of the PostgreSQL destination connector. It ensures that the destination database is reachable and that the provided credentials and configurations are correct. #### Example Log Extract:
Copy code
plaintext 2024-01-03 20:49:14 platform > Creating docker container = destination-postgres-check-5143ad9e-0d2d-4469-b42e-9d0370663796-0-nsvog with resources io.airbyte.config.ResourceRequirements@6fac8953[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null 2024-01-03 20:49:14 platform > Preparing command: docker run --rm --init -i -w /data/5143ad9e-0d2d-4469-b42e-9d0370663796/0 --log-driver none --name destination-postgres-check-5143ad9e-0d2d-4469-b42e-9d0370663796-0-nsvog --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/destination-postgres:0.4.0 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=<http://host.docker.internal:4317> -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.40 -e WORKER_JOB_ID=5143ad9e-0d2d-4469-b42e-9d0370663796 airbyte/destination-postgres:0.4.0 check --config source_config.json
[Source](https://github.com/airbytehq/airbyte/discussions/34181) ###
source-postgres-check
Pod This pod is responsible for checking the configuration and connectivity of the PostgreSQL source connector. It ensures that the source database is reachable and that the provided credentials and configurations are correct. #### Example Log Extract:
Copy code
plaintext 2023-10-29 18:08:32 INFO i.a.w.p.DockerProcessFactory(create):140 - Creating docker container = source-postgres-check-2-0-lmzhc with resources io.airbyte.config.ResourceRequirements@852f8cd[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=4096Mi,additionalProperties={}] and allowedHosts io.airbyte.config.AllowedHosts@696787be[hosts=[10.0.5.200, *.<http://datadoghq.com|datadoghq.com>, *.<http://datadoghq.eu|datadoghq.eu>, *.<http://sentry.io|sentry.io>],additionalProperties={}]
[Source](https://github.com/airbytehq/airbyte/issues/31949) ### Resource Intensity The resource requirements for these check pods can vary, but they are generally less resource-intensive compared to the actual sync pods. Here are some examples of resource configurations: - **CPU and Memory