team, after upgrading to 1.8.5 even though replica...
# ask-community-for-troubleshooting
b
team, after upgrading to 1.8.5 even though replication completed successfully, jobs marked as failed the connection log stays still at “Returning output…” and after some time “Retry State: RetryManager() Backoff before next attempt: 4 minutes “ and finally its failing after too many retries .. any help here ?
k
Based on the information in your question and the provided knowledge sources, this issue has been reported by other users after upgrading to Airbyte 1.8.x. The symptoms you describe—jobs marked as failed even though replication appears to complete, logs stuck at “Returning output…”, and repeated retries with increasing backoff—are consistent with a known problem related to storage issues, particularly with MinIO, which Airbyte uses for job logging and state management. Key findings from user reports: - The default MinIO persistent volume claim (PVC) is only 500MB, which can fill up quickly, especially in environments with frequent or large sync jobs. - When MinIO storage is full, Airbyte jobs may hang or fail, and logs may not progress past “Returning output…”. - Users have resolved the issue by either: - Deleting old logs from MinIO (e.g., from
/storage/airbyte-storage/job-logging/workspace
and
/storage/airbyte-storage/workload/output
). - Increasing the MinIO PVC size in the
values.yaml
file, for example:
Copy code
yaml     minio:       storage:         volumeClaimValue: 2Gi
- Setting the
TEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable to a lower value (e.g., 7 days) can help prevent MinIO from filling up in the future:
Copy code
yaml     env_vars:       TEMPORAL_HISTORY_RETENTION_IN_DAYS: 7
Recommended actions: 1. Check MinIO storage usage and clear out old logs if the volume is full. 2. Increase the MinIO PVC size if your workload requires more storage. 3. Adjust retention settings to prevent future overflows. These steps have been confirmed by multiple users to resolve the issue you’re experiencing after upgrading to 1.8.5. For more details and user discussions, see the related [GitHub issue](https://github.com/airbytehq/airbyte/issues/55812). If you continue to experience issues after addressing storage, please provide additional logs or error messages for further troubleshooting.
b
@kapa.ai we use s3 not Minio
k
A message from kapa.ai
Thank you for clarifying that you are using S3, not Minio. Based on the knowledge sources provided, there are several known issues and configuration pitfalls when using S3 for Airbyte logs and state storage, especially after upgrades: 1. S3 Configuration Issues After Upgrade - Users have reported that after upgrading Airbyte (including Helm chart upgrades), jobs may fail or hang if the S3 storage configuration is not set correctly. For example, Airbyte may default to using a bucket in the wrong region (e.g.,
eu-west-1
instead of your intended
us-east-1
), leading to authorization or region mismatch errors. - The solution is to explicitly set all relevant S3 bucket values in your Helm
values.yaml
(or equivalent config) to ensure Airbyte uses the correct bucket and region:
Copy code
yaml      global:        storage:          type: "S3"          bucket:            log: your-s3-bucket            state: your-s3-bucket            workloadOutput: your-s3-bucket          s3:            region: "us-east-1" # or your correct region            authenticationType: credentials # or instanceProfile if using IAM roles
- Make sure all these values are set to your actual S3 bucket and region, and that your credentials are correct. - If these are not set, Airbyte may attempt to use a default bucket in a different region, causing jobs to fail or hang as you described. [See details in this discussion](https://github.com/airbytehq/airbyte/issues/48407) and [official documentation](https://docs.airbyte.com/platform/1.8/enterprise-setup/implementation-guide#configuring-external-logging). 2. Known Issues with S3-Compatible Storage - There are open issues regarding S3-compatible storage (such as Exoscale, or custom endpoints), where Airbyte does not allow specifying a custom S3 endpoint, or the implementation between S3 and Minio is not unified. This can cause compatibility issues, but for standard AWS S3, setting the correct region and bucket should suffice. [See user reports](https://github.com/airbytehq/airbyte/issues/36896). 3. Environment Variable Changes - In Airbyte versions after 0.40.6, new environment variables were introduced for storage type. If you are using kustomize or other deployment methods, ensure you are setting:
Copy code
WORKER_LOGS_STORAGE_TYPE=S3      WORKER_STATE_STORAGE_TYPE=S3
- If these are not set, or are set incorrectly, the worker process may fail to start or jobs may hang. [See related issue](https://github.com/airbytehq/airbyte/issues/18016). 4. Other Troubleshooting Steps - If you see errors about malformed authorization headers or wrong regions, double-check your S3 region and bucket configuration. - If jobs are hanging at "Returning output..." and then retrying, it may be due to Airbyte not being able to write logs or state to S3, often because of misconfiguration or permission issues. Summary of Actions: - Double-check your S3 bucket, region, and credentials configuration in your deployment files. - Ensure all relevant
global.storage.bucket.*
and
global.storage.s3.region
values are set and match your actual S3 setup. - Make sure the correct environment variables are set for storage type. - Review your S3 IAM permissions to ensure Airbyte has full access to the specified buckets. If after these checks the issue persists, please provide any specific error messages from the logs for further diagnosis. If you need more detailed troubleshooting for the S3 destination connector itself, see the [S3 troubleshooting guide](https://docs.airbyte.com/integrations/destinations/s3#limitations--troubleshooting). If you have already checked all of the above and the problem remains, the knowledge sources do not provide a definitive solution for your specific scenario, and you may need to open a new issue with detailed logs for the Airbyte team to investigate further.
b
@kapa.ai s3 has access and working fine ..
k
A message from kapa.ai
Thank you for clarifying that S3 access is working fine. Based on the knowledge sources, if S3 is configured correctly and accessible, but you are still experiencing jobs that: - Complete replication, - Get stuck at “Returning output…”, - Enter repeated retries with increasing backoff, - And ultimately fail after too many retries, then this pattern has been reported by other users after upgrading Airbyte, especially when using S3 for logs and state storage. The most relevant findings are: 1. Configuration Mismatch or Incomplete Settings Even if S3 access is working, Airbyte may default to using a different bucket or region if all relevant configuration values are not explicitly set. This can cause jobs to hang or fail, even if S3 itself is accessible. Users have found that explicitly setting all S3-related configuration values (log, state, workloadOutput buckets, and region) in your deployment (e.g., Helm values) is necessary to avoid Airbyte defaulting to a bucket in the wrong region, which can cause silent failures or retries. [See user investigation and solution](https://github.com/airbytehq/airbyte/issues/48407). 2. Environment Variable Changes in Recent Versions After Airbyte v0.40.6, new environment variables were introduced for storage type. If these are not set correctly (e.g.,
WORKER_LOGS_STORAGE_TYPE=S3
and
WORKER_STATE_STORAGE_TYPE=S3
), the worker process may fail to start or jobs may hang, even if S3 is accessible. [See related issue](https://github.com/airbytehq/airbyte/issues/18016). 3. No Explicit Root Cause in Knowledge Sources If you have confirmed that: - All S3 buckets and regions are set correctly in your configuration, - The correct environment variables are set, - S3 permissions are correct, - And you are not seeing any S3-related errors in the logs, then the knowledge sources do not provide a definitive root cause or solution for jobs hanging at “Returning output…” and failing after retries when S3 is working fine. Summary: If S3 is accessible and configured correctly, but jobs still fail as described, the knowledge sources do not contain enough information to provide a confident, specific solution for this scenario. You may need to gather more detailed logs and open a new issue with the Airbyte team for further investigation. If you have additional error messages or logs, please share them for more targeted troubleshooting.