Hey team! :wave: I've been struggling with OOMKill...
# ask-community-for-troubleshooting
k
Hey team! šŸ‘‹ I've been struggling with OOMKilled errors on replication jobs using Airbyte Helm chart v2.0.18 (community edition). After extensive debugging, I discovered that
global.jobs.resources
in values.yaml is not being used by the chart templates. Steps to reproduce: 1. Install Airbyte using Helm v2 chart (2.0.18) 2. Set job resources in values.yaml: yaml
Copy code
global:
  jobs:
    resources:
      requests:
        cpu: "1"
        memory: "8Gi"
      limits:
        cpu: "4"
        memory: "16Gi"
1. Apply with
helm upgrade
2. Check the configmap:
kubectl get configmap airbyte-airbyte-env -n airbyte-v2 -o yaml | grep JOB_MAIN_CONTAINER
3. Result: All
JOB_MAIN_CONTAINER_*
values are empty strings
""
4. Replication job pods still use default 2Gi memory limits and OOMKill on large syncs Root cause: The chart templates in
templates/config/_workloads.tpl
read from
.Values.global.workloads.resources.mainContainer.*
not from
.Values.global.jobs.resources.*
Workaround that works: yaml
Copy code
global:
  workloads:
    resources:
      mainContainer:
        cpu:
          request: "1"
          limit: "4"
        memory:
          request: "8Gi"
          limit: "16Gi"
This properly populates the
JOB_MAIN_CONTAINER_*
environment variables and the replication jobs respect the limits. Questions: 1. Is
global.jobs.resources
supposed to work but there's a bug? 2. Or should the docs be updated to show
global.workloads.resources.mainContainer
as the correct path? 3. Should
global.jobs.resources
be deprecated/removed from the values schema if it's not used? Happy to provide more details or open a GitHub issue if helpful!
k
Based on the knowledge sources, you've identified a real issue with resource configuration in Airbyte Helm chart v2. Let me clarify what's happening: ## The Correct Configuration Path In Helm chart V2, the correct path for configuring job resources is indeed
global.workloads.resources.mainContainer.*
, not
global.jobs.resources.*
. This is documented in the [Helm chart V2 upgrade guide](https://docs.airbyte.com/platform/next/enterprise-setup/chart-v2-enterprise#update-your-valuesyaml-file). According to the migration documentation, the environment variables map as follows: -
JOB_MAIN_CONTAINER_CPU_REQUEST
→
global.workloads.resources.mainContainer.cpu.request
-
JOB_MAIN_CONTAINER_CPU_LIMIT
→
global.workloads.resources.mainContainer.cpu.limit
-
JOB_MAIN_CONTAINER_MEMORY_REQUEST
→
global.workloads.resources.mainContainer.memory.request
-
JOB_MAIN_CONTAINER_MEMORY_LIMIT
→
global.workloads.resources.mainContainer.memory.limit
## Answering Your Questions 1. Is
global.jobs.resources
supposed to work?
- No, this path appears to be either deprecated or incorrectly documented. The chart templates read from
global.workloads.resources.mainContainer.*
as you discovered. 2. Should docs be updated? - Yes, if
global.jobs.resources
is still mentioned in the values.yaml schema or documentation without clarification, it should be updated to reflect that
global.workloads.resources.mainContainer
is the correct path for Helm chart V2. 3. Should it be deprecated/removed? - If it's not being used by the templates, it should either be removed from the schema or clearly marked as deprecated to avoid confusion. ## Your Workaround is Correct The configuration you're using is the proper way to set job resources in Helm chart V2:
Copy code
yaml global:   workloads:     resources:       mainContainer:         cpu:           request: "1"           limit: "4"         memory:           request: "8Gi"           limit: "16Gi"
This is consistent with similar issues reported in [GitHub issue #42921](https://github.com/airbytehq/airbyte/issues/42921) and [GitHub issue #48816](https://github.com/airbytehq/airbyte/issues/48816), where users experienced resource configuration not being respected until they used the correct configuration path or restarted worker pods. I'd recommend opening a GitHub issue to request clarification in the documentation and potentially removal of the unused
global.jobs.resources
path from the values schema to prevent future confusion.
k
@kapa.ai, I guess I have to set
Copy code
useConnectorResourceDefaults: false  # This should work!
if I want to avoid the postgres 2gb memory limit? it seems that's where its running out of memory.
k
A message from kapa.ai
Based on the knowledge sources, setting
useConnectorResourceDefaults: false
is not the correct approach to override memory limits for your Postgres connector. Here's what you should do instead: ## Proper Way to Configure Connector Resources Airbyte has a configuration hierarchy for resource allocation with four levels of precedence (from lowest to highest): 1. Instance-wide - Applies to all containers 2. Connector definition - Applies to all connectors of that type (e.g., all Postgres sources) 3. Connector-specific - Applies to one specific connector instance 4. Connection-specific - Applies to a specific source/destination combination According to the [configuring connector resources documentation](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources), you have several options: ### Option 1: Instance-Wide Configuration (Recommended First Step) Set these in your `values.yaml`:
Copy code
yaml global:   workloads:     resources:       mainContainer:         memory:           request: "4Gi"  # or higher           limit: "8Gi"    # or higher
### Option 2: Connector Definition Level If you only want to increase memory for Postgres connectors specifically, you can update the connector definition in the database: 1. Find the connector definition ID:
Copy code
sql select * from actor_definition where actor_definition.docker_repository like '%postgres%';
2. Update the resource requirements:
Copy code
sql update actor_definition set resource_requirements = '{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"memory_limit": "8Gi", "memory_request": "4Gi"}}]}' where id = '<id-from-step-1>';
## Important Notes - The 2GB default memory limit is a common issue mentioned in [this forum discussion](https://discuss.airbyte.io/t/3472), where users experienced OOM errors with Postgres syncs - As noted in [GitHub issue #48816](https://github.com/airbytehq/airbyte/issues/48816), you may also need to set resources for other job types (CHECK, DISCOVER, SPEC) if you're experiencing OOM during connector setup:
Copy code
yaml global:   env_vars:     CHECK_JOB_MAIN_CONTAINER_MEMORY_REQUEST: "4Gi"     CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT: "8Gi"
The
useConnectorResourceDefaults
flag is not mentioned in the knowledge sources as a way to override memory limits. Instead, use the configuration hierarchy described above.
k
@kapa.ai I dont only want to update for the connector, but it seems the global values won't affect it? Do I have to go into my airbyte DB to make this config change?
k
A message from kapa.ai
Based on the knowledge sources, there's a known issue where global/instance-wide resource configurations are being ignored for connector-level resources in recent Airbyte versions. This is documented in several GitHub issues. ## The Problem According to [GitHub issue #38178](https://github.com/airbytehq/airbyte/issues/38178), users report that syncs don't respect the
resource_requirements
set in the
actor_definition
table, and instead fall back to instance-wide defaults from the configmap. However, there are also reports in [GitHub issue #30814](https://github.com/airbytehq/airbyte/issues/30814) where users state that instance-wide settings are being ignored entirely. One user in [GitHub issue #38178](https://github.com/airbytehq/airbyte/issues/38178) using Airbyte v1.7.1 reported: > "We self-host Airbyte on AKS and want to set specific CPU/memory requests and limits per source (type)... we adjusted the source connector actor definition in Airbyte's PostgreSQL database... but it is completely ignored. Instance-wide defaults defined in Helm chart are used instead." ## Current Workarounds Based on the knowledge sources, here are your options: ### 1. Connection-Specific Configuration (Most Reliable) According to [the documentation](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#configuring-connection-specific-requirements), connection-specific settings have the highest precedence and appear to work more reliably: 1. Extract the connection ID from the URL (format:
<base_url>/workspaces/<workspace-id>/connections/<connection-id>/status
) 2. Update the connection in the database:
Copy code
sql update connection set resource_requirements = '{"cpu_limit": "4", "cpu_request": "1", "memory_limit": "8Gi", "memory_request": "4Gi"}' where id = '<connection-id>';
As noted in [GitHub issue #30814](https://github.com/airbytehq/airbyte/issues/30814): "works at the connection-level, but not at the instance-level or connector-level" ### 2. Connector Definition Configuration You can try updating the
actor_definition
table for your Postgres connector:
Copy code
sql update actor_definition set resource_requirements = '{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "4", "cpu_request": "1", "memory_limit": "8Gi", "memory_request": "4Gi"}}]}' where id = '<connector-definition-id>';
However, multiple users report this doesn't work reliably in recent versions. ## Answer to Your Question Yes, unfortunately you likely need to make the configuration change directly in the Airbyte database at the connection level, as global/instance-wide settings appear to be ignored for connector resources in recent versions. This is a known issue that affects multiple users across different Airbyte versions (1.2.0, 1.7.1, and others). The connection-specific approach is currently the most reliable workaround according to the reported experiences in the knowledge sources.
k
@kapa.ai, ok, so I made the change int he DB, and now I see that one connector has increassed max mem and it worked:
Copy code
M1       orchestrator     ā—     airbyte/container-orchestrator:2.0.0      false     Completed               0 off:off:off           0    2000:2000         0         0      0     2048:2048          0          0        0:0           7m       │
│ M2       source           ā—     airbyte/source-postgres:3.7.0             false     Completed               0 off:off:off           0    1000:4000         0         0      0    8192:16384          0          0        0:0           7m       │
│ M3       destination      ā—     airbyte/destination-snowflake:4.0.14      false     Completed               0 off:off:off           0    1000:2000         0         0      0     2048:2048          0          0        0:0           7m
I guess i would have to do it agaain for snowflake if I wanted. what about the orchestrator, or others? it doesn't seem to have picked up my global max memory settings.. I spun up a big machine with lots of memory, and am having a lot of trouble getting airbyte to make use of it 🤷
k
A message from kapa.ai
Based on the knowledge sources, you're correct that the orchestrator container is not picking up your global memory settings. This is a known behavior in Airbyte. ## Orchestrator Container Resources According to [this forum discussion](https://discuss.airbyte.io/t/3775), the orchestrator container **inherits its resource requirements from the
JOB_MAIN_CONTAINER_*
environment variables**. As stated in the official answer: > "Yes, the orchestrator pods get their resource requirements from the
JOB_MAIN_CONTAINER_*
environment variables. These environment variables are used to configure the resource requirements for the job containers, which include the orchestrator pods." So when you set:
Copy code
yaml global:   workloads:     resources:       mainContainer:         memory:           request: "8Gi"           limit: "16Gi"
This should affect the orchestrator as well. However, based on your output showing the orchestrator still at
2048:2048
(2Gi), it appears these settings aren't being applied. ## Why Your Global Settings Aren't Working From your earlier conversation, you discovered that
global.jobs.resources
doesn't work in Helm chart v2. You need to ensure you're using the correct path:
global.workloads.resources.mainContainer.*
Additionally, according to [GitHub issue #48816](https://github.com/airbytehq/airbyte/issues/48816), you may need to set separate environment variables for different job types:
Copy code
yaml global:   env_vars:     JOB_MAIN_CONTAINER_MEMORY_REQUEST: "8Gi"     JOB_MAIN_CONTAINER_MEMORY_LIMIT: "16Gi"     CHECK_JOB_MAIN_CONTAINER_MEMORY_REQUEST: "8Gi"     CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT: "16Gi"     DISCOVER_JOB_MAIN_CONTAINER_MEMORY_REQUEST: "8Gi"     DISCOVER_JOB_MAIN_CONTAINER_MEMORY_LIMIT: "16Gi"     SPEC_JOB_MAIN_CONTAINER_MEMORY_REQUEST: "8Gi"     SPEC_JOB_MAIN_CONTAINER_MEMORY_LIMIT: "16Gi"
## Verifying Configuration You can verify what resources are being applied by checking the job logs. According to the [debugging documentation](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#debugging), Airbyte logs the resource requirements when containers are created:
Copy code
2024-10-28 23:58:10 platform > [orchestrator] image: airbyte/container-orchestrator:build-256f73c6c2-20488-master resources: ResourceRequirements(claims=[], limits={memory=2Gi, cpu=1}, requests={memory=2Gi, cpu=1}, additionalProperties={})
## Next Steps 1. Verify your
values.yaml
is using
global.workloads.resources.mainContainer.*
(not
global.jobs.resources.*
) 2. After updating, run
helm upgrade
and restart your worker pods (this is often necessary for changes to take effect) 3. Check the configmap:
kubectl get configmap airbyte-airbyte-env -o yaml | grep JOB_MAIN_CONTAINER
4. If the environment variables are still empty, you may need to set them explicitly as shown above For the Snowflake destination, yes, you would need to update its connection-specific settings in the database as well if you want different limits than the global defaults.