Taher Ajnawala
05/30/2024, 7:16 AMkapa.ai
05/30/2024, 7:17 AMdestination-postgres-check
and source-postgres-check
pods do not honor the nodeSelector
configuration specified in the values.yaml
file. This is a known issue with the Airbyte Helm chart deployment. Based on the information from the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133), the nodeSelectors
configured in the Helm chart values.yaml
for the jobs are not being respected for all jobs, only for sync jobs. This issue has been reported by multiple users. Here are some steps and considerations to address this issue: 1. Environment Variables for Jobs: Ensure that you have set up the environment variables for the jobs correctly. According to the [forum post](https://discuss.airbyte.io/t/4533), setting environment variables for the jobs can help in some cases. However, it seems that this might not be sufficient for all job types. 2. Customizing values.yaml
: Make sure your values.yaml
file is correctly configured. You can refer to the example values.yaml
file in the [Airbyte repository](https://github.com/airbytehq/airbyte-platform/blob/main/charts/airbyte/values.yaml) for guidance. 3. Fallback Mechanism: The fallback mechanism for nodeSelectors
might not be working as expected. This is highlighted in the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133). The normalization jobs, for example, do not have a specific environment variable to set the nodeSelector
. 4. Using Open Policy Agent (OPA): As a temporary workaround, you can use tools like [Open Policy Agent (OPA)](https://github.com/open-policy-agent/gatekeeper) to enforce policies ensuring that the pods run on the correct nodes. This approach has been mentioned by users facing similar issues. 5. Reviewing Code References: If you have the capability, you might want to review the code references mentioned in the GitHub issue to understand where the fallback might be failing. However, this requires familiarity with the Airbyte codebase. Here is a snippet from the values.yaml
file for setting up `nodeSelectors`: yaml jobs: kube: nodeSelectors: key: "value"
After making the necessary changes, you can upgrade your Helm release: bash helm upgrade --install %RELEASE_NAME% airbyte/airbyte -n <NAMESPACE> --values /path/to/values.yaml --version 0.50.13
For more detailed information, you can refer to the [Airbyte documentation](https://docs.airbyte.com/deploying-airbyte/on-kubernetes-via-helm#overview) and the [GitHub issue](https://github.com/airbytehq/airbyte/issues/27133). If the issue persists, you might want to follow up on the GitHub issue or the forum post for any updates or potential fixes from the Airbyte team.Taher Ajnawala
05/30/2024, 7:21 AMkapa.ai
05/30/2024, 7:21 AMJOB_MAIN_CONTAINER_CPU_REQUEST
, JOB_MAIN_CONTAINER_CPU_LIMIT
, JOB_MAIN_CONTAINER_MEMORY_REQUEST
, and JOB_MAIN_CONTAINER_MEMORY_LIMIT
([source](https://discuss.airbyte.io/t/3775)). - Server, Temporal, and Webapp: These pods are less resource-intensive compared to worker pods but still require adequate resources to handle API requests, orchestrate workflows, and serve the UI respectively. - db and minio: These pods also need resources to handle database operations and log storage but are typically not as resource-intensive as worker pods. - Pod-sweeper: This pod is generally lightweight as its main task is to clean up completed or failed pods. For scaling and ensuring that the pods have sufficient resources, it is recommended to start with a mid-sized cloud instance (e.g., 4 or 8 cores) and adjust based on the workload ([source](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale)). If you need more detailed information on resource usage and scaling, you can refer to the [Scaling Airbyte documentation](https://docs.airbyte.com/operator-guides/scaling-airbyte#what-to-scale).Taher Ajnawala
05/30/2024, 7:22 AMkapa.ai
05/30/2024, 7:22 AMkapa.ai
05/30/2024, 7:22 AMdestination-postgres-check
and source-postgres-check
pods are used in Airbyte to verify the connectivity and configuration of the respective connectors before running a sync job. Here’s a detailed explanation of what these pods do and their resource requirements: ### destination-postgres-check
Pod This pod is responsible for checking the configuration and connectivity of the PostgreSQL destination connector. It ensures that the destination database is reachable and that the provided credentials and configurations are correct. #### Example Log Extract: plaintext 2024-01-03 20:49:14 platform > Creating docker container = destination-postgres-check-5143ad9e-0d2d-4469-b42e-9d0370663796-0-nsvog with resources io.airbyte.config.ResourceRequirements@6fac8953[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null 2024-01-03 20:49:14 platform > Preparing command: docker run --rm --init -i -w /data/5143ad9e-0d2d-4469-b42e-9d0370663796/0 --log-driver none --name destination-postgres-check-5143ad9e-0d2d-4469-b42e-9d0370663796-0-nsvog --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/destination-postgres:0.4.0 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=<http://host.docker.internal:4317> -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.50.40 -e WORKER_JOB_ID=5143ad9e-0d2d-4469-b42e-9d0370663796 airbyte/destination-postgres:0.4.0 check --config source_config.json
[Source](https://github.com/airbytehq/airbyte/discussions/34181) ### source-postgres-check
Pod This pod is responsible for checking the configuration and connectivity of the PostgreSQL source connector. It ensures that the source database is reachable and that the provided credentials and configurations are correct. #### Example Log Extract: plaintext 2023-10-29 18:08:32 INFO i.a.w.p.DockerProcessFactory(create):140 - Creating docker container = source-postgres-check-2-0-lmzhc with resources io.airbyte.config.ResourceRequirements@852f8cd[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=4096Mi,additionalProperties={}] and allowedHosts io.airbyte.config.AllowedHosts@696787be[hosts=[10.0.5.200, *.<http://datadoghq.com|datadoghq.com>, *.<http://datadoghq.eu|datadoghq.eu>, *.<http://sentry.io|sentry.io>],additionalProperties={}]
[Source](https://github.com/airbytehq/airbyte/issues/31949) ### Resource Intensity The resource requirements for these check pods can vary, but they are generally less resource-intensive compared to the actual sync pods. Here are some examples of resource configurations: - **CPU and Memory