Aasim ali
07/21/2025, 6:41 PMkapa.ai
07/21/2025, 6:41 PMTransientErrorException
with the message: Some streams either received an INCOMPLETE stream status, or did not receive a stream status at all: null.feature_flags, null.persons, null.insights, null.events, null.projects
This typically means that some streams failed to complete successfully, often due to upstream (source) errors or interruptions during the sync process. The error is marked as "transient," suggesting it might be recoverable on retry, but repeated failures indicate a persistent problem. 2. Source process exited with code 137: Exit code 137 usually means the source connector was killed by the operating system, most commonly due to running out of memory (OOM). This is a well-known issue in Airbyte when the source process consumes more memory than allocated, especially with large data volumes or complex queries. Similar issues have been reported, where the source process is killed at around 400MB of data, leading to incomplete syncs and downstream errors in the destination connector. The stacktrace and error pattern match those described in [this issue](https://github.com/airbytehq/airbyte/issues/39128). 3. Destination process exited with code 1: The destination connector also reports a non-zero exit code (1), which is a generic error indicating that something went wrong during the write phase. This is often a consequence of the source process failing, as the destination may not receive the expected data or stream status, leading to incomplete or failed writes. 4. Replication process error: The replication worker reports that the destination process exited with code 1, and notes that this warning is normal if the job was cancelled. However, in your case, the repeated failures and the final job failure after too many retries indicate a persistent underlying issue. Summary of likely root cause and next steps: - The primary issue appears to be the source connector running out of memory and being killed (exit code 137). This leads to incomplete data streams, which then cause the destination connector to fail with transient errors and incomplete statuses. - The repeated retries and eventual job failure are consistent with Airbyte's retry logic, which backs off and eventually gives up after a set number of failures. Recommended actions: - Increase memory allocation for the source connector. If you're running Airbyte on Kubernetes or Docker, adjust the resource limits for the source container. - Check the data volume and consider splitting large syncs into smaller batches or reducing the number of streams synced at once. - Review source queries or configurations to optimize memory usage. These recommendations are based on similar error patterns and resolutions discussed in the Airbyte community and GitHub issues, particularly regarding exit code 137 and incomplete stream statuses in the destination connector [source-aws datalake: exit code 137](https://github.com/airbytehq/airbyte/issues/39128). If you continue to experience issues after increasing memory, further investigation into the specific source and destination connector logs will be necessary to identify any connector-specific bugs or misconfigurations.Aasim ali
07/21/2025, 6:45 PMkapa.ai
07/21/2025, 6:45 PMkapa.ai
07/21/2025, 6:45 PMJOB_MAIN_CONTAINER_MEMORY_REQUEST
and JOB_MAIN_CONTAINER_MEMORY_LIMIT
environment variables to higher values [Scaling Airbyte](https://docs.airbyte.com/platform/operator-guides/scaling-airbyte#what-to-scale). - Batch size impact: The source worker reads up to 10,000 records in memory. If your average row size is large (e.g., 0.5MB), you may need 5GB of RAM just for that batch [Scaling Airbyte](https://docs.airbyte.com/platform/operator-guides/scaling-airbyte#what-to-scale). In summary: - For most production workloads, start with 16GB RAM and increase as needed. - For Java connectors, consider setting JOB_MAIN_CONTAINER_MEMORY_LIMIT
to 4–8GB or higher if you see OOM errors. - Monitor your sync logs and adjust based on actual memory usage and error patterns. If you are still experiencing OOM errors after increasing memory, you may need to further tune your batch sizes or split your syncs into smaller jobs.Aasim ali
07/21/2025, 6:48 PMkapa.ai
07/21/2025, 6:48 PMkapa.ai
07/21/2025, 6:48 PMJOB_MAIN_CONTAINER_MEMORY_REQUEST
- JOB_MAIN_CONTAINER_MEMORY_LIMIT
These variables control the memory requested and limited for the main sync job containers. For example, you might set them to 4g
or 8g
depending on your workload. Example for Docker Compose: yaml environment: JOB_MAIN_CONTAINER_MEMORY_REQUEST: 4g JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8g
For Kubernetes (Helm): You can set these in your `values.yaml`: yaml global: env_vars: JOB_MAIN_CONTAINER_MEMORY_REQUEST: 4g JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8g
Important: - If you are running other types of jobs (like check, discover, spec), you may also need to set the corresponding environment variables for those jobs (e.g., CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT
, etc.), as Airbyte uses different variables for each job type. This was highlighted as a common source of confusion and OOM errors if not all relevant variables are set [Cannot use Airbyte on Kubernetes due to OOM when setting up connector](https://github.com/airbytehq/airbyte/issues/48816). - After changing these settings, restart your Airbyte services for the changes to take effect. If you still encounter OOM errors after increasing the limits, you may need to further increase the values or review your sync configuration for optimization. If you need more detailed steps for your specific deployment type, let me know!Aasim ali
07/21/2025, 6:50 PMkapa.ai
07/21/2025, 6:50 PMkapa.ai
07/21/2025, 6:51 PMcpu: 500m
(0.5 core) for requests - cpu: 1000m
(1 core) for limits - Memory: - memory: 2Gi
(2 GB) for requests - memory: 4Gi
(4 GB) for limits Example values.yaml: yaml global: env_vars: JOB_MAIN_CONTAINER_CPU_REQUEST: 500m JOB_MAIN_CONTAINER_CPU_LIMIT: 1000m JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi JOB_MAIN_CONTAINER_MEMORY_LIMIT: 4Gi CHECK_JOB_MAIN_CONTAINER_CPU_REQUEST: 500m CHECK_JOB_MAIN_CONTAINER_CPU_LIMIT: 1000m CHECK_JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi CHECK_JOB_MAIN_CONTAINER_MEMORY_LIMIT: 4Gi DISCOVER_JOB_MAIN_CONTAINER_CPU_REQUEST: 500m DISCOVER_JOB_MAIN_CONTAINER_CPU_LIMIT: 1000m DISCOVER_JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi DISCOVER_JOB_MAIN_CONTAINER_MEMORY_LIMIT: 4Gi SPEC_JOB_MAIN_CONTAINER_CPU_REQUEST: 500m SPEC_JOB_MAIN_CONTAINER_CPU_LIMIT: 1000m SPEC_JOB_MAIN_CONTAINER_MEMORY_REQUEST: 2Gi SPEC_JOB_MAIN_CONTAINER_MEMORY_LIMIT: 4Gi SIDECAR_MAIN_CONTAINER_CPU_REQUEST: 100m SIDECAR_MAIN_CONTAINER_CPU_LIMIT: 200m SIDECAR_MAIN_CONTAINER_MEMORY_REQUEST: 256Mi SIDECAR_MAIN_CONTAINER_MEMORY_LIMIT: 512Mi
Key points: - These are starting points. You should monitor your jobs and adjust up or down as needed. - If you have large data volumes or see OOMKilled errors, increase the memory limits. - For smaller workloads, you can reduce these values. - The units for memory are typically Mi
(mebibytes) or Gi
(gibibytes), and for CPU, m
means millicores (1000m = 1 core) [How to specify CPU,MEMORY limit in .env](https://discuss.airbyte.io/t/1675), [Scaling Airbyte After Installation](https://docs.airbyte.com/platform/next/enterprise-setup/scaling-airbyte). If you need more aggressive scaling, you can further increase these values, but always base adjustments on observed resource usage and job performance.Aasim ali
07/21/2025, 6:51 PMSIDECAR_MAIN_CONTAINER_MEMORY_LIMIT
@kapa.aikapa.ai
07/21/2025, 6:51 PMkapa.ai
07/21/2025, 6:52 PMSIDECAR_MAIN_CONTAINER_MEMORY_LIMIT
is an environment variable used in Airbyte deployments to specify the maximum amount of memory that the sidecar container in a job pod is allowed to use. The sidecar container is a supporting process that runs alongside the main connector container during Airbyte jobs (such as sync, check, discover, or spec). Setting this variable helps ensure that the sidecar container does not exceed a certain memory usage, which can help prevent out-of-memory (OOM) errors and improve resource management in your Kubernetes or Docker environment. You can set this variable (and related ones for CPU and memory requests/limits) in your values.yaml
file when deploying Airbyte via Helm, as shown in the example you provided. This variable is part of a broader set of resource configuration options for different job types and containers in Airbyte, as discussed in [this GitHub issue](https://github.com/airbytehq/airbyte/issues/48816) and the official documentation on configuring connector resources [Configuring Connector Resources](https://docs.airbyte.com/platform/next/operator-guides/configuring-connector-resources). If you need to control the memory usage of the sidecar container specifically, set SIDECAR_MAIN_CONTAINER_MEMORY_LIMIT
to your desired value (e.g., 512Mi
or 1Gi
).Aasim ali
07/21/2025, 7:01 PMkapa.ai
07/21/2025, 7:01 PMkapa.ai
07/21/2025, 7:01 PMvalues.yaml
file under the appropriate section for each component. These values are based on best practices and examples from the knowledge sources, but you should adjust them based on your workload and monitoring. yaml jobs: resources: requests: cpu: 500m memory: 2Gi limits: cpu: 1000m memory: 4Gi checkJob: resources: requests: cpu: 500m memory: 2Gi limits: cpu: 1000m memory: 4Gi discoverJob: resources: requests: cpu: 500m memory: 2Gi limits: cpu: 1000m memory: 4Gi specJob: resources: requests: cpu: 500m memory: 2Gi limits: cpu: 1000m memory: 4Gi sidecar: resources: requests: cpu: 100m memory: 256Mi limits: cpu: 200m memory: 512Mi
Notes: - These are starting points and should be tuned for your actual workload. - The structure above is a common pattern for Kubernetes Helm charts, but the exact section names (jobs
, checkJob
, etc.) may vary depending on your Airbyte Helm chart version. - If your chart uses a different structure, place the resources
block under the relevant component (e.g., worker
, workload-launcher
, etc.) as shown in [this example](https://github.com/airbytehq/airbyte/issues/60915). - If you need to set these for specific connectors or connections, refer to the [Configuring Connector Resources](https://docs.airbyte.com/platform/next/operator-guides/configuring-connector-resources) documentation for SQL-based overrides. If you need the exact YAML structure for your specific Airbyte Helm chart version, please provide the version or a sample of your current values.yaml
and I can tailor the answer further.Aasim ali
07/21/2025, 7:14 PMkapa.ai
07/21/2025, 7:14 PMkapa.ai
07/21/2025, 7:14 PMvalues.yaml
file under the job resources section. For example: yaml jobs: resources: requests: memory: 4Gi limits: memory: 8Gi
Or, if you are using environment variables (for Docker or Kubernetes), set: - JOB_MAIN_CONTAINER_MEMORY_REQUEST
(minimum memory) - JOB_MAIN_CONTAINER_MEMORY_LIMIT
(maximum memory) Example: yaml JOB_MAIN_CONTAINER_MEMORY_REQUEST: 4Gi JOB_MAIN_CONTAINER_MEMORY_LIMIT: 8Gi
These settings will apply to the containers running the replication (sync) jobs, including the orchestrator, source, and destination containers. This approach is documented in the official Airbyte resources configuration guides [Configuring Connector Resources](https://docs.airbyte.com/platform/next/operator-guides/configuring-connector-resources) and [Scaling Airbyte](https://docs.airbyte.com/platform/next/enterprise-setup/scaling-airbyte). Note: - If you want to set different memory limits for other job types (check, discover, spec), you need to set their respective resource configurations as well. - After changing these settings, restart your Airbyte deployment for the changes to take effect. If you need more granular control (e.g., only for a specific connection or connector), refer to the documentation for connection-specific or connector-specific resource overrides.Aasim ali
07/21/2025, 7:22 PMkapa.ai
07/21/2025, 7:22 PMkapa.ai
07/21/2025, 7:22 PMSource process exited with non-zero exit code 137
means that the source connector process was killed by the operating system, most commonly due to running out of memory (OOM). This is a well-known issue in Airbyte when the source process consumes more memory than allocated, especially with large data volumes or complex queries. This pattern is confirmed in multiple Airbyte GitHub issues, where exit code 137 is associated with OOM situations and is resolved by increasing the memory limits for the job containers [Cluster 'airbyte-abctl' could not be created](https://github.com/airbytehq/airbyte/issues/53691), [source-salesforce process exited with non-zero exit code 137](https://github.com/airbytehq/airbyte/issues/37368), [source-aws datalake exit code 137](https://github.com/airbytehq/airbyte/issues/39128). Recommended action: Increase the memory limit for your Airbyte job containers (source, destination, or both) to prevent the process from being killed due to OOM. This can be done by adjusting the relevant resource settings in your deployment configuration. If you need more details on how to do this, let me know your deployment type (Docker, Kubernetes, etc.) and I can provide a specific example.Aasim ali
07/21/2025, 7:29 PMkapa.ai
07/21/2025, 7:30 PMkapa.ai
07/21/2025, 7:30 PMvalues.yaml
or environment variables), Airbyte jobs still use the default values (often 2Gi). This has been observed in both Kubernetes and Docker deployments. For example, one user noted that even after setting JOB_MAIN_CONTAINER_MEMORY_LIMIT=6g
, the logs still showed a 2Gi limit being used [Airbyte is ignoring JOB_MAIN_CONTAINER_MEMORY_REQUEST
in .env](https://github.com/airbytehq/airbyte/issues/30814). 2. Resource precedence and configuration hierarchy: Airbyte applies resource configurations in a specific order of precedence: - Instance-wide (lowest) - Connector definition - Connector-specific - Connection-specific (highest) If a more specific configuration (e.g., at the connector or connection level) is set to 2Gi, it will override your instance-wide or job-level setting. You should check if there are any connector or connection-specific resource requirements set in your Airbyte database or UI that might be overriding your global/job settings [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources). 3. Pods may need to be restarted: In some cases, after changing resource limits, the worker pods need to be manually deleted or restarted for the new configuration to take effect. Users have reported that after deleting the worker pods, the new resource limits were correctly applied [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). What you should do: - Double-check for any connector-specific or connection-specific resource overrides. - Make sure you have updated the correct section in your values.yaml
or environment variables. - After making changes, restart your Airbyte worker pods to ensure the new settings are picked up. - Check the job logs for the actual resource requirements being used, as Airbyte logs these at the start of each job. This can help you confirm which configuration is being applied [Configuring Connector Resources - Debugging](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#debugging). If after these steps the issue persists, it may be a bug or a limitation in your current Airbyte version, as reported in the issues above.Aasim ali
07/21/2025, 7:42 PMkapa.ai
07/21/2025, 7:43 PMkapa.ai
07/21/2025, 7:43 PMJOB_MAIN_CONTAINER_MEMORY_REQUEST
in .env](https://github.com/airbytehq/airbyte/issues/30814). What you should do: - Check for any connector-specific or connection-specific resource overrides. - Make sure you have updated the correct section in your values.yaml
or environment variables. - After making changes, restart your Airbyte worker pods to ensure the new settings are picked up. - Check the job logs for the actual resource requirements being used, as Airbyte logs these at the start of each job [Configuring Connector Resources - Debugging](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#debugging). If after these steps the issue persists, it may be a bug or a limitation in your current Airbyte version, as reported in the issues above.Aasim ali
07/21/2025, 7:53 PMkapa.ai
07/21/2025, 7:53 PMkapa.ai
07/21/2025, 7:53 PMyaml worker: jobDefaultEnv: - name: JAVA_OPTS value: "-XX:MaxRAMPercentage=75.0"
is syntactically correct for setting the JAVA_OPTS
environment variable for the worker in a Kubernetes Helm deployment. However, based on the knowledge sources, this setting alone **does not control the actual Kubernetes pod memory limits**—it only tells the Java process inside the container to use up to 75% of the container's available memory. Important notes: - The actual memory available to the Java process is determined by the Kubernetes resource limits you set (e.g., memory: 4Gi
). - If you want to increase the memory available to your Airbyte jobs, you should set the memory requests and limits in your values.yaml
under the appropriate section (such as jobs.resources.limits.memory
), not just via JAVA_OPTS
[Scaling Airbyte](https://docs.airbyte.com/platform/next/operator-guides/scaling-airbyte#what-to-scale). - There have been reports that relying solely on JAVA_OPTS
and not setting the Kubernetes resource limits can lead to OOM errors, as the container may be killed before Java reaches its internal limit [Airbyte Server Memory leak / Have a way to specify max Memory usage in GB and not percentage](https://github.com/airbytehq/airbyte/issues/27844). Summary: Your configuration is correct for setting the Java option, but to actually control memory usage and avoid OOM errors, you must also set the Kubernetes resource limits for your jobs. The JAVA_OPTS
setting alone is not sufficient.Aasim ali
07/21/2025, 8:04 PMkapa.ai
07/21/2025, 8:04 PMkapa.ai
07/21/2025, 8:05 PMvalues.yaml
file. If you are already setting higher limits but the pods still use 2Gi, check for connector- or connection-specific overrides, and make sure to restart your worker pods after making changes [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources), [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). 2. Monitor your job logs after making changes. Airbyte logs the actual resource limits at the start of each job. Confirm that your new settings are being picked up. 3. If the problem persists: - Double-check for any connector- or connection-level resource overrides. - Make sure you are editing the correct configuration section for your deployment type. - Restart the relevant pods or the entire Airbyte deployment to ensure new settings are applied. Summary: Your sync is failing because the source connector is running out of memory (exit code 137), which causes the destination to fail with incomplete data. Increase your memory limits for the job containers and ensure the new settings are applied and picked up by the pods. This is a common and well-documented issue in Airbyte deployments.Aasim ali
07/21/2025, 8:14 PMkapa.ai
07/21/2025, 8:14 PMkapa.ai
07/21/2025, 8:14 PMvalues.yaml
or environment variables but still see 2Gi in the logs, this suggests that either: - There is a more specific resource configuration (such as connector- or connection-specific) overriding your global/job settings, or - The new configuration has not been picked up by the running pods (which may require a restart of worker pods or the Airbyte deployment), or - There is a known issue where Airbyte sometimes ignores the configured limits and uses defaults instead [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921), [Airbyte is ignoring JOB_MAIN_CONTAINER_MEMORY_REQUEST
in .env](https://github.com/airbytehq/airbyte/issues/30814). What you should do: - Double-check for connector- or connection-specific resource overrides. - Ensure you have updated the correct configuration section for your deployment. - Restart your Airbyte worker pods after making changes. - Check the job logs (as you did) to confirm the actual resource limits being applied [Configuring Connector Resources - Debugging](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#debugging). If after these steps the resource limits are still not updated, it may be a bug or limitation in your Airbyte version.Aasim ali
07/21/2025, 8:19 PMkapa.ai
07/21/2025, 8:19 PMkapa.ai
07/21/2025, 8:20 PMvalues.yaml
under the workload launcher section. By default, Airbyte recommends leaving these unset unless you have a specific need, as noted in the Helm chart comments: > We usually recommend not to specify default resources and to leave this as a conscious choice for the user. This also increases chances charts run on environments with little resources, such as Minikube. If you do want to specify resources, uncomment the following lines, adjust them as necessary, and remove the curly braces after 'resources:'. If your goal is to increase resources for the replication jobs (source/destination/orchestrator), focus on the job resource settings, not the workload launcher. If you want to tune the workload launcher itself, you can do so, but it is not required for most resource-related issues with sync jobs [see example in Helm chart](https://github.com/airbytehq/airbyte/issues/49843). If you are not seeing your job resource changes take effect, check for connector- or connection-specific overrides and ensure you restart the relevant pods after making changes [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921).Aasim ali
07/21/2025, 8:28 PMkapa.ai
07/21/2025, 8:28 PMkapa.ai
07/21/2025, 8:29 PMJOB_MAIN_CONTAINER_MEMORY_LIMIT=1Gi JOB_MAIN_CONTAINER_MEMORY_REQUEST=1Gi
but your replication pod still shows a 2Gi memory limit, this is a known issue reported by several Airbyte users. The most common causes are: 1. Connector- or Connection-Specific Overrides: Airbyte allows resource requirements to be set at multiple levels (global, connector, connection). If a more specific resource requirement (e.g., at the connector or connection level) is set to 2Gi, it will override your environment variable settings. You should check if there are any such overrides in your Airbyte UI or database [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources). 2. Pods Not Restarted: After changing environment variables, you must restart the worker pods for the new settings to take effect. Users have reported that manually deleting the worker pods allows the new resource limits to be applied [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). 3. Known Issues in Airbyte: There have been reports that Airbyte sometimes ignores the environment variable settings and uses default values (often 2Gi) instead, even after configuration changes. This has been observed in both Docker and Kubernetes deployments [Airbyte is ignoring JOB_MAIN_CONTAINER_MEMORY_REQUEST
in .env](https://github.com/airbytehq/airbyte/issues/30814). What you should do: - Check for connector- or connection-specific resource overrides. - Make sure you have updated the correct environment variables. - After making changes, restart your Airbyte worker pods. - Check the job logs to confirm which resource limits are being applied. If the issue persists after these steps, it may be a bug or limitation in your current Airbyte version.Aasim ali
07/21/2025, 8:30 PMkapa.ai
07/21/2025, 8:30 PMkapa.ai
07/21/2025, 8:31 PMJOB_MAIN_CONTAINER_CPU_LIMIT
, JOB_MAIN_CONTAINER_CPU_REQUEST
, JOB_MAIN_CONTAINER_MEMORY_LIMIT
, and JOB_MAIN_CONTAINER_MEMORY_REQUEST
when you deploy Airbyte. However, if you are setting, for example, JOB_MAIN_CONTAINER_MEMORY_LIMIT=4Gi
and JOB_MAIN_CONTAINER_MEMORY_REQUEST=1Gi
, but your replication pod still shows a 2Gi max memory limit, this is a known issue. Possible reasons based on the knowledge sources: 1. Resource Precedence and Overrides: Airbyte applies resource configurations in a specific order of precedence: - Instance-wide (lowest) - Connector definition - Connector-specific - Connection-specific (highest) If a more specific configuration (such as connector or connection-specific) is set to 2Gi, it will override your global/job-level setting. You should check if there are any connector or connection-specific resource requirements set in your Airbyte database or UI that might be overriding your global/job settings [Configuring Connector Resources](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources). 2. Pods Need to be Restarted: After changing resource limits, the worker pods may need to be manually deleted or restarted for the new configuration to take effect. Users have reported that after deleting the worker pods, the new resource limits were correctly applied [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). 3. Known Issues with Resource Application: There have been reports that Airbyte sometimes ignores the resource limits set in the configuration and uses default values instead. This has been observed in both Kubernetes and Docker deployments [Airbyte is ignoring JOB_MAIN_CONTAINER_MEMORY_REQUEST
in .env](https://github.com/airbytehq/airbyte/issues/30814). What you should do: - Check for any connector-specific or connection-specific resource overrides. - Make sure you have updated the correct section in your values.yaml
or environment variables. - After making changes, restart your Airbyte worker pods to ensure the new settings are picked up. - Check the job logs for the actual resource requirements being used, as Airbyte logs these at the start of each job. If after these steps the issue persists, it may be a bug or a limitation in your current Airbyte version, as reported in the issues above.Aasim ali
07/21/2025, 9:39 PMevents
, num records: 8730, num bytes: 139.75 MB, allocated bytes: 140.19 MB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2025-07-21 173659 destination INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2025-07-21 173751 replication-orchestrator INFO Records read: 5000 (4 GB)
i have 2max gB how it rading this much @kapa.aikapa.ai
07/21/2025, 9:39 PMkapa.ai
07/21/2025, 9:40 PM[ASYNC QUEUE INFO]
and [ASYNC WORKER INFO]
logs, where you can see values like "allocated: 150.19 MB" and "max: 445.44 MB", which are well below your 2GiB memory limit. This behavior is expected and is described in Airbyte's scaling documentation: the source worker reads up to 10,000 records in memory, and the memory required depends on the average row size. However, the total data processed can be much larger than the memory limit, as Airbyte does not load all data into memory at once [Scaling Airbyte](https://docs.airbyte.com/platform/operator-guides/scaling-airbyte#what-to-scale). Summary: Airbyte can process more data than the container's memory limit because it streams and batches records, not loading all data into memory at once. The "4 GB" refers to total data read, not memory usage. If your job is not being OOMKilled, your memory settings are sufficient for your current workload.Aasim ali
07/22/2025, 6:22 PMkapa.ai
07/22/2025, 6:22 PMkapa.ai
07/22/2025, 6:22 PMresource_requirements
field for that source in the actor
table. However, the knowledge sources provided do not explicitly document updating the actor
table for resource requirements. Instead, they provide clear instructions for setting resource requirements at the connection or connector definition level. If you want to set resource requirements for a specific connection (which includes a specific source and destination pair), you can do so with the following SQL command: sql UPDATE connection SET resource_requirements = '{"cpu_limit": "2", "cpu_request": "2", "memory_limit": "2048Mi", "memory_request": "2048Mi"}' WHERE id = '<your-connection-id>';
You can get the connection ID from the Airbyte UI URL as described in the documentation [Configuring Connection-Specific Requirements](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#configuring-connection-specific-requirements). If you want to set resource requirements for all connectors of a specific type (e.g., all instances of a source connector), you can update the actor_definition
table: sql UPDATE actor_definition SET resource_requirements = '{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "2", "cpu_request": "2", "memory_limit": "2048Mi", "memory_request": "2048Mi"}}]}' WHERE id = '<actor-definition-id>';
You can find the actor_definition
ID by querying for the connector's image name [Configuring Connector Definition Requirements](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#configuring-connector-definition-requirements). Summary: - For a specific connection: update the connection
table. - For all instances of a connector: update the actor_definition
table. The knowledge sources do not provide a documented way to set resource requirements for a single source instance via the actor
table directly. If you need per-source (not per-connection) granularity, you may need to use the connection-level override as the closest supported method.Aasim ali
07/22/2025, 6:39 PMUPDATE connection
SET resource_requirements = '{"cpu_limit": "100m", "cpu_request": "256m", "memory_limit": "6Gi", "memory_request": "2Gi"}'
WHERE id ='f27cc58e-7120-4ed8-a627-c8559edd887d';
@kapa.ai is this correctkapa.ai
07/22/2025, 6:40 PMkapa.ai
07/22/2025, 6:40 PMsql UPDATE connection SET resource_requirements = '{"cpu_limit": "100m", "cpu_request": "256m", "memory_limit": "6Gi", "memory_request": "2Gi"}' WHERE id = 'f27cc58e-7120-4ed8-a627-c8559edd887d';
This will set the CPU and memory requests and limits for the connection with the specified ID. The format and usage match the documented method for connection-level overrides [Configuring Connection-Specific Requirements](https://docs.airbyte.com/platform/operator-guides/configuring-connector-resources#configuring-connection-specific-requirements). > Note: > - The values for cpu_limit
, cpu_request
, memory_limit
, and memory_request
should be set according to your workload needs. > - After making this change, you may need to restart your Airbyte jobs or pods for the new settings to take effect.Aasim ali
08/14/2025, 1:10 PMkapa.ai
08/14/2025, 1:10 PMkapa.ai
08/14/2025, 1:10 PMUserDefinedBackoffException
while reading from the GitHub stream. This exception is typically raised when the connector hits a retryable error (such as rate limiting or a transient HTTP error) and the backoff handler is invoked to manage retries. Based on the knowledge sources, here’s what’s happening: - Airbyte’s CDK for HTTP streams uses a backoff strategy (by default, exponential backoff) when it encounters HTTP 429 (rate limit) or 5XX errors. It will retry the request up to a certain number of times before giving up and raising an exception. This is governed by the should_retry
and backoff_time
methods, which can be customized in the connector code. If the error persists after the maximum number of retries, a UserDefinedBackoffException
is raised and the sync fails [Rate Limiting in CDK](https://docs.airbyte.com/platform/connector-development/cdk-python/http-streams#rate-limiting). - The stack trace shows the error is being raised deep in the HTTP client’s retry logic, which matches the documented behavior for handling rate limits and transient errors in Airbyte connectors. What you can do: - Check the logs for the specific HTTP response code and message (e.g., 429, 500, etc.) to determine if you are hitting rate limits or server errors. - If you are hitting rate limits, consider increasing the backoff time or reducing the frequency of syncs. - If you are developing or customizing the connector, you can override the backoff strategy to better handle your API’s rate limits [Backoff Strategies in Connector Builder](https://docs.airbyte.com/platform/connector-development/connector-builder-ui/error-handling#backoff-strategies). If you need more specific troubleshooting steps, please provide the exact error message or HTTP response code that triggered the backoff exception. The knowledge sources do not provide further details for this specific stack trace.Aasim ali
08/14/2025, 2:49 PMkapa.ai
08/14/2025, 2:49 PMkapa.ai
08/14/2025, 2:50 PMio.airbyte.commons.exceptions.TransientErrorException: Some streams were unsuccessful due to a source error. See logs for details.
indicates that the destination connector (in this case, S3) failed because one or more streams did not complete successfully due to an upstream (source) error. This is a common pattern in Airbyte: if the source connector encounters an error or is interrupted (for example, due to an out-of-memory issue or a network problem), the destination will not receive all the expected data and will throw a TransientErrorException
when closing the stream consumer. This error is not specific to the S3 destination; it is a generic error that can occur with any destination when the source fails or when there is an issue during the replication process. The knowledge sources confirm that this is a typical error pattern in Airbyte, and the root cause is usually found in the source connector logs or in the replication job logs, not in the destination itself. For example, similar error patterns are seen in [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921) and [source-aws datalake exit code 137](https://github.com/airbytehq/airbyte/issues/39128). What you should do: - Review the full Airbyte job logs, especially the source connector logs, to identify the underlying error that caused the stream(s) to fail. - Common causes include source connector OOM (out-of-memory), API rate limits, network issues, or misconfiguration. - Once the root cause in the source is resolved, the destination error should also be resolved. The knowledge sources do not provide more specific troubleshooting steps for this exact error message, but the pattern and recommended next steps are consistent across multiple Airbyte issues.Aasim ali
08/14/2025, 5:06 PMkapa.ai
08/14/2025, 5:06 PMkapa.ai
08/14/2025, 5:07 PMio.airbyte.commons.exceptions.TransientErrorException: Some streams were unsuccessful due to a source error. See logs for details.
is a generic error that indicates the destination (S3 in your case) failed because the source connector encountered an error. The root cause is typically found in the source logs, not the destination. The knowledge sources show this pattern is most often related to issues such as out-of-memory errors, API rate limits, or network problems in the source connector, rather than permission issues on the destination side [source-aws datalake exit code 137](https://github.com/airbytehq/airbyte/issues/39128), [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). **However, permission issues can also cause source errors**—for example, if the source connector cannot access the data due to missing or incorrect credentials, it may fail and trigger this error downstream. But the knowledge sources do not explicitly mention permission errors as a common cause for this specific error pattern. In summary: This error could be caused by a variety of issues, including but not limited to permissions. To determine if it is a permission issue, you need to review the detailed source connector logs for explicit permission-denied or access-denied messages. The knowledge sources do not provide enough information to confirm permissions as the root cause without those logs.Aasim ali
08/14/2025, 5:09 PMkapa.ai
08/14/2025, 5:09 PMkapa.ai
08/14/2025, 5:09 PMpull_request_commits
stream. After several retries, it gives up and raises a UserDefinedBackoffException: Conflict.
Key points from the logs: - No records extracted or loaded: The sync did not pull any data. - Warning from source: Not Found: Indicates the API endpoint returned a 404. - Max retry limit reached: Airbyte retried the request 6 times before failing. - 404 Not Found: This is a common error when the requested resource (e.g., a pull request or commit) does not exist, or the API endpoint is incorrect. ### What does this mean? - 404 Not Found errors in Airbyte's GitHub connector typically mean that the connector is trying to access a resource that does not exist or is not accessible with the current credentials. This could be due to: - The repository, pull request, or commit does not exist. - The GitHub token used does not have permission to access the resource. - The repository is private and the token lacks the necessary scopes. - The API endpoint is being called with incorrect parameters. This pattern is confirmed in the knowledge sources, where similar 404 errors are discussed as being caused by incorrect URLs or missing resources, and are not retried by default unless explicitly configured in the connector's error handler settings [Error handling](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/error-handling), [Sync worker failed](https://discuss.airbyte.io/t/581). ### What should you do? - Check the GitHub token permissions: Ensure your token has the required scopes (at least repo
for private repositories) and access to the repository in question [GitHub connector permissions](https://docs.airbyte.com/integrations/sources/github#limitations--troubleshooting). - Verify the resource exists: Make sure the repository and pull requests you are syncing actually exist and are accessible. - Review connector configuration: Double-check the repository name and other parameters in your Airbyte connection settings. - Handle 404s gracefully: If you expect some resources to be missing, you can configure the connector to ignore 404 errors using the error handler in the YAML manifest, as described in the documentation [Error handling](https://docs.airbyte.com/platform/connector-development/config-based/understanding-the-yaml-file/error-handling). If after checking these points you still encounter issues, review the full logs for more details on which specific resource is returning 404, and adjust your configuration or permissions accordingly. Summary: The error is most likely due to missing resources or insufficient permissions for the GitHub token. Double-check your token scopes and resource existence, and consider configuring the connector to ignore 404 errors if appropriate.Aasim ali
08/15/2025, 8:11 PMkapa.ai
08/15/2025, 8:11 PMkapa.ai
08/15/2025, 8:11 PMUserDefinedBackoffException
with the message "Conflict." This exception is raised during the HTTP request retry logic in the Airbyte CDK when a retryable error occurs and the maximum number of retries is reached. From the knowledge sources, we know: - Airbyte's CDK for HTTP streams uses backoff strategies (such as exponential backoff) to handle retryable errors, including HTTP 409 (Conflict), 429 (Too Many Requests), and 5XX errors. If the error persists after the maximum number of retries, a UserDefinedBackoffException
is raised and the sync fails [Rate Limiting in CDK](https://docs.airbyte.com/platform/connector-development/cdk-python/http-streams#rate-limiting). - The error message "Conflict" typically corresponds to an HTTP 409 response, which means the request could not be completed due to a conflict with the current state of the resource on the server. This is not specific to Airbyte, but is standard HTTP behavior. What you should do: - Check the logs for the specific HTTP response code and message to confirm if it is indeed a 409 Conflict. - Review the API documentation for the source you are syncing from to understand what might cause a conflict in your use case. - If you are developing or customizing the connector, you can adjust the backoff strategy or error handling logic to better handle this scenario [Backoff Strategies in Connector Builder](https://docs.airbyte.com/platform/connector-development/connector-builder-ui/error-handling#backoff-strategies). The knowledge sources do not provide further details for this specific error, so if you need more targeted troubleshooting, you may need to consult the API documentation for your source or review the full Airbyte logs for more context.Aasim ali
08/15/2025, 8:13 PMkapa.ai
08/15/2025, 8:13 PMkapa.ai
08/15/2025, 8:14 PMUserDefinedBackoffException: Conflict.
, and the final HTTP response code is 403 Forbidden. The relevant log lines are: 2025-08-15 15:02:20 platform ERROR Giving up _send(...) after 6 tries (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Conflict.) 2025-08-15 15:02:20 platform ERROR Max retry limit reached in 10.991083s. Request: <PreparedRequest [GET]>, Response: <Response [403]> 2025-08-15 15:02:20 platform ERROR Check failed
### What does this mean? - 403 Forbidden indicates that the request was understood by the server, but the server is refusing action. This is almost always a permissions issue. - The repeated UserDefinedBackoffException: Conflict.
and retries show that Airbyte attempted the request several times, but each time received a response indicating a conflict or forbidden access. - The check job ultimately fails because it cannot authenticate or is not authorized to access the resource. ### Is this a permissions issue? Yes, based on the knowledge sources and the HTTP 403 response, this is most likely a permissions issue. This pattern is consistent with other Airbyte issues where a 403 error during source or destination checks is due to missing or incorrect credentials, insufficient API permissions, or access restrictions on the resource being requested. For example, similar 403 errors are discussed in [source-Confluence REST API encounters a 403 error](https://github.com/airbytehq/airbyte/issues/33737) and [prestashop 403 Forbidden](https://github.com/airbytehq/airbyte/issues/56963). ### What should you do? - Verify your credentials: Make sure the API key, token, or credentials you are using are correct and have the necessary permissions for the resource. - Check resource access: Ensure the user or service account has access to the specific endpoint or data you are trying to sync. - Review API documentation: Some APIs require specific scopes or roles for certain endpoints. If you continue to see 403 errors after verifying permissions, check if there are any IP allowlists, organization policies, or other access controls that could be blocking the request. Summary: Your Airbyte connection check is failing due to a permissions issue, as indicated by the repeated 403 Forbidden responses from the API. Double-check your credentials and permissions for the resource you are trying to access.Aasim ali
08/15/2025, 8:14 PMkapa.ai
08/15/2025, 8:14 PMkapa.ai
08/15/2025, 8:15 PMFailure in source: Checking source connection failed - please review this connection’s configuration to prevent future syncs from failing
means that Airbyte was unable to successfully connect to your source during the connection check. This is a generic error and can be caused by several issues, including: - Invalid credentials or permissions: The most common cause is incorrect or insufficient credentials for the source system, or the user/account does not have the necessary permissions to access the data. For example, a 403 Forbidden or 404 Not Found error from the source API often indicates a permissions or configuration issue [source-Confluence REST API encounters a 403 error](https://github.com/airbytehq/airbyte/issues/33737), [Sync jobs do not appear to respect CPU / memory requests and limits](https://github.com/airbytehq/airbyte/issues/42921). - Incorrect configuration: The source configuration (such as API endpoint, database host, or resource identifiers) may be incorrect or point to a non-existent resource, leading to connection failures [source-github connection doesn't sync when source is created via API is invalid](https://github.com/airbytehq/airbyte/issues/57023). - Network or firewall issues: The Airbyte instance may not be able to reach the source due to network restrictions or firewall rules [airbyte abctl not connecting to the source DB](https://github.com/airbytehq/airbyte/issues/62463). - API rate limits or transient errors: Sometimes, rate limiting or temporary errors from the source system can cause the connection check to fail, though these are usually marked as transient or system errors [Connection troubleshooting](https://docs.airbyte.com/platform/cloud/managing-airbyte-cloud/review-connection-status#connection-troubleshooting). What you should do: 1. Review and re-test your source configuration: Double-check your credentials, permissions, and all configuration fields for accuracy. 2. Check the detailed error logs: Airbyte provides more specific error messages in the logs, which can help you identify the exact cause. 3. Re-test and save the source: After making corrections, use the "Retest saved source" button in the Airbyte UI to verify the connection before running another sync [Connection troubleshooting](https://docs.airbyte.com/platform/cloud/managing-airbyte-cloud/review-connection-status#connection-troubleshooting). If the problem persists, consult the detailed error logs for more information or check for known issues with your specific connector.