Mohamed Akram Lahdir
07/22/2025, 2:02 PMkapa.ai
07/22/2025, 2:03 PMvalues.yaml
file. For example, if you want Airbyte to use a folder named my-folder
inside your bucket, you should specify it in the bucket
section like this: yaml global: storage: type: "S3" secretName: airbyte-config-secrets bucket: log: airbyte-bucket/my-folder state: airbyte-bucket/my-folder workloadOutput: airbyte-bucket/my-folder s3: region: "us-east-1" authenticationType: credentials
Replace airbyte-bucket
with your actual bucket name and my-folder
with your desired folder path. This will ensure that Airbyte writes logs, state, and workload output to the specified subdirectory within your S3 bucket, rather than the root of the bucket. This approach is supported in the Airbyte Helm chart configuration for both state and logging storage, as shown in the documentation examples for S3 storage configuration [State and Logging Storage](https://docs.airbyte.com/platform/next/deploying-airbyte/integrations/storage#values). If you are configuring a destination connector (such as S3 as a destination for syncs), you can set the "S3 Bucket Path" field in the connector setup to your desired subdirectory. This field determines the subdirectory under the bucket where data will be written [Set up the S3 destination connector in Airbyte](https://docs.airbyte.com/integrations/destinations/s3#step-2-set-up-the-s3-destination-connector-in-airbyte). Note: If you leave the path empty, Airbyte will use the root of the bucket. If you specify a folder, all data will be written under that folder. Multiple slashes are collapsed, so you don't need to worry about accidental double slashes in the path [Set up the S3 destination connector in Airbyte](https://docs.airbyte.com/integrations/destinations/s3#step-2-set-up-the-s3-destination-connector-in-airbyte).Mohamed Akram Lahdir
07/22/2025, 2:04 PMMohamed Akram Lahdir
07/22/2025, 2:15 PMkapa.ai
07/22/2025, 2:15 PMkapa.ai
07/22/2025, 2:15 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable to 30
. This variable controls how long job history and related files are retained. After the specified period, Airbyte will clean up old files automatically. How to configure in abctl/Helm deployment: In your values.yaml
file, add: yaml global: env_vars: TEMPORAL_HISTORY_RETENTION_IN_DAYS: 30
This will ensure that Airbyte retains job history and workspace files for 30 days, after which they will be deleted to free up storage. This setting is supported and documented for both abctl and Helm-based deployments [Configuring Airbyte - Data Retention](https://docs.airbyte.com/platform/next/operator-guides/configuring-airbyte#data-retention) [Configure TEMPORAL_HISTORY_RETENTION_IN_DAYS in abctl deployment](https://github.com/airbytehq/airbyte/discussions/46355). Note: - This retention policy applies to job history and workspace files managed by Airbyte. - For S3 destination data, Airbyte does not automatically delete synced data from your S3 bucket after a retention period. You should use S3 lifecycle rules to manage retention and deletion of data files in your S3 bucket if needed. If you need to sweep logs or workspace files more aggressively, ensure you are running Airbyte version 0.45.5 or later, as this feature was reimplemented in that version [Quickly run out of storage](https://discuss.airbyte.io/t/2417) [Airbyte log sweeping](https://discuss.airbyte.io/t/2714). Summary: - Set TEMPORAL_HISTORY_RETENTION_IN_DAYS: 30
in your deployment configuration. - For S3 data retention, use AWS S3 lifecycle policies.Mohamed Akram Lahdir
07/22/2025, 2:17 PMkapa.ai
07/22/2025, 2:17 PMkapa.ai
07/22/2025, 2:17 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable controls how long Airbyte retains job history and workspace files managed by Airbyte itself (such as logs and internal state), not the data files written to your S3 destination bucket during syncs [Configuring Airbyte - Data Retention](https://docs.airbyte.com/platform/operator-guides/configuring-airbyte#data-retention). Files written to your S3 bucket as part of syncs (i.e., your actual data) are not automatically deleted by Airbyte after 30 days. Airbyte does not manage retention or deletion of data in your S3 destination bucket based on this setting. If you see files older than 30 days in your S3 bucket, this is expected behavior. To automatically delete or manage retention of data files in your S3 bucket, you need to set up an S3 Lifecycle Policy directly in AWS S3. Airbyte does not provide built-in retention or deletion for destination data files [Set up the S3 destination connector in Airbyte](https://docs.airbyte.com/integrations/destinations/s3#step-2-set-up-the-s3-destination-connector-in-airbyte). Summary: - TEMPORAL_HISTORY_RETENTION_IN_DAYS
only affects Airbyte's internal job/workspace files, not your S3 destination data. - For S3 data retention, use AWS S3 lifecycle rules. If you need more details on how to set up S3 lifecycle policies, let me know!Mohamed Akram Lahdir
07/22/2025, 2:18 PMkapa.ai
07/22/2025, 2:18 PMkapa.ai
07/22/2025, 2:19 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
to 30 (or are relying on the default), but still see internal files (such as logs or workspace files) older than 30 days in your S3 storage, you are not alone—several users have reported similar issues. According to the knowledge sources, while the environment variable is intended to control the retention of job history in Temporal and related workspace files (defaulting to 30 days) [Configuring Airbyte - Data Retention](https://docs.airbyte.com/platform/operator-guides/configuring-airbyte#data-retention), there are multiple reports that the cleanup does not always work as expected, especially in Kubernetes/Helm or abctl deployments. Users have noted that even after setting this variable, old files are not always deleted automatically [[Github Issue: Auto delete job history logs after specific period](https://github.com/airbytehq/airbyte/issues/39358)]. Additionally, maintainers and contributors have acknowledged that while TEMPORAL_HISTORY_RETENTION_IN_DAYS
should clean up the Temporal database, there is no clear evidence or documentation confirming that it reliably cleans up all internal files or workspace data in every deployment scenario. Some users have had to manually intervene to delete files or prune tables [[Purging of old syncs from Airbyte databases](https://github.com/airbytehq/airbyte/discussions/55233)]. In summary: - The variable is intended to control retention, but there are known issues where cleanup does not occur as expected. - There is no official solution or workaround documented for this issue as of now. - Manual cleanup may still be necessary in some cases. If this is impacting your deployment, you may want to follow or comment on the related GitHub issues for updates, and consider manual cleanup as a temporary solution.Mohamed Akram Lahdir
07/22/2025, 2:32 PMkapa.ai
07/22/2025, 2:32 PMkapa.ai
07/22/2025, 2:33 PM/
or an unsupported format), S3 may reject the request with a signature error. There is a known issue where an invalid bucket path causes misleading signature errors. Changing the bucket path to a valid value resolved the issue for other users. Make sure your bucket path does not start with a slash and follows the expected format (e.g., my-folder/subfolder
) [[Github Issue: invalid s3 bucket path gives cryptic/misleading error message](https://github.com/airbytehq/airbyte/issues/38338)]. 2. Credentials or Region Mismatch: If your credentials, region, or signing method do not match the bucket's configuration, you will get this error. Double-check that your AWS Access Key, Secret Key, and region are correct and match the bucket's actual region [[Forum: Redshift destination setup fails with S3 staging](https://discuss.airbyte.io/t/1044)]. 3. S3-Compatible Storage: If you are using S3-compatible storage (like Minio), ensure the endpoint and path style are set correctly. Incorrect endpoint or path style can also cause signature errors [[Github Issue: S3 Data Lake initial validation failure with Nessie/Minio](https://github.com/airbytehq/airbyte/issues/56411)]. What to do: - Check your bucket path: Make sure it does not start with /
and is not empty or just /
. Use a valid subdirectory name. - Review your credentials and region: Ensure they are correct and match the bucket's configuration. - If using S3-compatible storage: Double-check your endpoint and path style settings. If you correct the bucket path and still see the error, try reverting to the previous working configuration to confirm the issue is with the new path. If you need more help, please provide the exact bucket path you are using (without sharing sensitive information), and I can help you check if it matches the expected format.Mohamed Akram Lahdir
07/22/2025, 2:34 PMkapa.ai
07/22/2025, 2:34 PMkapa.ai
07/22/2025, 2:34 PM/
or using just /
), S3 may reject the request with a signature error. This has been reported by other users, and correcting the bucket path to a valid subdirectory (e.g., my-folder
or my-folder/subfolder
) resolved the issue. Do not start the path with a slash, and avoid using just /
as the path [[invalid s3 bucket path gives cryptic/misleading error message](https://github.com/airbytehq/airbyte/issues/38338)]. 2. Credentials or Region Mismatch: If your AWS credentials or region do not match the bucket's configuration, you will get this error. Double-check that your access key, secret key, and region are correct and match the actual S3 bucket settings [[Redshift destination setup fails with S3 staging](https://discuss.airbyte.io/t/1044)]. 3. S3-Compatible Storage Issues: If you are using S3-compatible storage (like Minio), ensure the endpoint and path style are set correctly. Incorrect settings can also cause signature errors [[S3 Data Lake initial validation failure with Nessie/Minio](https://github.com/airbytehq/airbyte/issues/56411)]. What you should do: - Double-check the bucket path in your Airbyte configuration. It should be a valid subdirectory and not start with /
. - Verify your AWS credentials and region. - If using S3-compatible storage, check your endpoint and path style settings. If you correct the bucket path and still see the error, try reverting to the previous working configuration to confirm the issue is with the new path. If you need more specific help, please share the exact bucket path you are using (without sensitive details), and I can help verify if it matches the expected format.Mohamed Akram Lahdir
07/22/2025, 2:54 PMstorage:
type: "S3"
storageSecretName: airbyte-config-secrets
bucket:
log: may-data-testing/airbyte_platform/
state: may-data-testing/airbyte_platform/
workloadOutput: may-data-testing/airbyte_platform/
s3:
region: "eu-west-3"
authenticationType: credentials
kapa.ai
07/22/2025, 2:54 PMkapa.ai
07/22/2025, 2:55 PMyaml bucket: log: may-data-testing/airbyte_platform/ state: may-data-testing/airbyte_platform/ workloadOutput: may-data-testing/airbyte_platform/
What you should use: yaml bucket: log: may-data-testing state: may-data-testing workloadOutput: may-data-testing
If you want Airbyte to store files under a subdirectory (e.g., airbyte_platform/
), you should configure this in the connector or application settings that support a "path" or "prefix"—not in the bucket name itself. The bucket name in the config must match the actual S3 bucket name, without any slashes or subdirectory paths [State and Logging Storage](https://docs.airbyte.com/platform/next/deploying-airbyte/integrations/storage#values). Summary of what to fix: - Set the bucket name to just may-data-testing
(no slashes or subdirectories). - If you need to use a subdirectory, configure it in the relevant connector or Airbyte UI field for "path" or "prefix" (not in the values.yaml
bucket section). Example fixed config: yaml storage: type: "S3" storageSecretName: airbyte-config-secrets bucket: log: may-data-testing state: may-data-testing workloadOutput: may-data-testing s3: region: "eu-west-3" authenticationType: credentials
This should resolve the S3 signature error you are seeing [invalid s3 bucket path gives cryptic/misleading error message](https://github.com/airbytehq/airbyte/issues/38338).Mohamed Akram Lahdir
07/22/2025, 2:56 PMkapa.ai
07/22/2025, 2:56 PMkapa.ai
07/22/2025, 2:56 PMlog
, state
, workloadOutput
) should contain only the S3 bucket name, not a path or subdirectory. The documentation and examples consistently show the bucket name alone, such as: yaml bucket: log: airbyte-bucket state: airbyte-bucket workloadOutput: airbyte-bucket
[State and Logging Storage](https://docs.airbyte.com/platform/next/deploying-airbyte/integrations/storage#values) If you want to store Airbyte logs, state, or workload output in a subdirectory within your bucket (e.g., airbyte_platform/
), the knowledge sources do not provide a supported way to specify a subdirectory or prefix for these internal files via the values.yaml
or storage configuration. The bucket name must match the actual S3 bucket name, and adding a path (like may-data-testing/airbyte_platform/
) will cause S3 signature errors, as you experienced. Summary: - The Airbyte storage configuration expects only the bucket name, not a path. - There is no documented or supported way in the provided knowledge sources to set a subdirectory/prefix for internal logs, state, or workload output via the storage config. - If you need to organize files in subdirectories, you may need to handle this outside of Airbyte (e.g., with S3 lifecycle rules or post-processing scripts). If you are configuring an S3 destination connector (for sync data), you can specify a "bucket path" (subdirectory) in the connector settings, but this does not apply to internal Airbyte logs or state storage [Set up the S3 destination connector in Airbyte](https://docs.airbyte.com/integrations/destinations/s3#step-2-set-up-the-s3-destination-connector-in-airbyte). If you need further clarification or if new documentation becomes available, please let me know!