<@U04S8PLJWMT> When deploying the Airbyte using he...
# ask-ai
w
@kapa.ai When deploying the Airbyte using helm chart, minio pods are created through statefulset, is it?
k
Yes, when deploying Airbyte using the Helm chart, MinIO pods are indeed created through a StatefulSet. This is evident from several knowledge sources provided: 1. In the [Github issue about MinIO statefulset not working](https://github.com/airbytehq/airbyte/issues/36801), it's mentioned that during the deployment of Airbyte, the MinIO StatefulSet is included. 2. Another [Github issue discussing helm delete behavior](https://github.com/airbytehq/airbyte/issues/18591) shows that after deleting the Helm release, the MinIO StatefulSet remains:
Copy code
$ kubectl delete -l <http://app.kubernetes.io/instance=airbyte|app.kubernetes.io/instance=airbyte> pods,statefulset,configmap,secret,service    ...    statefulset.apps "airbyte-minio" deleted    ...
3. In a [forum post about using StatefulSets](https://discuss.airbyte.io/t/1216), it's suggested that using StatefulSets for components like MinIO would help retain the state of connections and custom connectors. These sources confirm that MinIO is deployed as a StatefulSet in the Airbyte Helm chart, which is a common practice for stateful applications in Kubernetes to ensure stable network identities and persistent storage.
w
@kapa.ai What is role of MinIO pod in Airbyte? I currently have storage valueClaimValue for minio as 500Mi, I would like to increase storage for minio? Or have mechanism to delete the logs automatically after 3 days for connection sync jobs completion.
k
A message from kapa.ai
Based on the knowledge sources provided, I can answer your questions about the role of MinIO in Airbyte and how to manage its storage: 1. Role of MinIO pod in Airbyte: MinIO is primarily used for log storage in Airbyte. According to a [forum post explaining Airbyte pods](https://discuss.airbyte.io/t/1944), MinIO is described as "for logs storage". It stores logs from sync jobs and other Airbyte operations. 2. Increasing storage for MinIO: You can increase the storage for MinIO by modifying the
volumeClaimValue
in your Helm chart's
values.yaml
file. However, be aware that simply changing this value may not immediately apply to an existing deployment. According to a [Github issue about ignored MinIO storage values](https://github.com/airbytehq/airbyte/issues/43357), you might need to delete the existing PersistentVolumeClaim (PVC) and allow it to be recreated with the new size. 3. Automatically deleting logs: Airbyte does have a mechanism for automatically cleaning up old logs. As of version 0.40.5 and later, you can configure the retention period for logs using the
TEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable. According to a [forum post about Airbyte log sweeping](https://discuss.airbyte.io/t/2714), you can set this to retain logs for only 3 days by adding the following to your environment configuration:
Copy code
TEMPORAL_HISTORY_RETENTION_IN_DAYS=3
This setting will cause Airbyte to automatically clean up files from the workspace every 3 days, which should help manage your MinIO storage usage. If you're using an older version of Airbyte, you might need to upgrade to take advantage of this feature. Alternatively, for a manual approach, you can delete old logs directly from the MinIO storage as suggested in another [forum post about cleaning up MinIO storage](https://discuss.airbyte.io/t/3101). However, the automatic cleanup method is generally preferred for ongoing maintenance. Remember to always backup your data before making changes to storage configurations or performing manual cleanups.
w
@kapa.ai What is different between airbyte-dev-logs and airbyte-storage folder within storage folder of minio pod? There is nothing airbyte-dev-logs within this folder, however airbyte-storage folder has job-logging workload folders.
k
A message from kapa.ai
Based on the knowledge sources provided, I can explain the difference between the
airbyte-dev-logs
and
airbyte-storage
folders within the MinIO pod's storage folder: 1. `airbyte-dev-logs`: This folder is typically used for storing logs in the default MinIO configuration. However, you mentioned that this folder is empty in your case. This could be due to a custom configuration or a change in recent versions of Airbyte. 2. `airbyte-storage`: This folder seems to be the active storage location for Airbyte logs and workload data. You mentioned it contains
job-logging
and
workload
folders, which aligns with Airbyte's storage structure. The difference in usage between these folders might be due to changes in Airbyte's configuration or version. According to a [forum post about MinIO storage](https://discuss.airbyte.io/t/3101), the job logs are typically stored in the path
/storage/airbyte-storage/job-logging/workspace/
. It's worth noting that in some configurations, as mentioned in a [Github issue about worker configuration](https://github.com/airbytehq/airbyte/issues/18016), the storage buckets can be configured using environment variables:
Copy code
STORAGE_BUCKET_ACTIVITY_PAYLOAD: airbyte-storage STORAGE_BUCKET_LOG: airbyte-storage STORAGE_BUCKET_STATE: airbyte-storage STORAGE_BUCKET_WORKLOAD_OUTPUT: airbyte-storage
This configuration suggests that
airbyte-storage
is indeed the intended location for various types of data, including logs and workload outputs. The empty
airbyte-dev-logs
folder might be a remnant from an older configuration or a placeholder for development environments. If you're not seeing any data there, it's likely that your current Airbyte setup is configured to use the
airbyte-storage
folder exclusively. If you need to manage or clean up the logs, you should focus on the
airbyte-storage/job-logging
directory, as this is where the active log data is being stored in your setup.
w
@kapa.ai How to clean up airbyte-storage/job-logging directory after
exec
into minio pod?
k
A message from kapa.ai
Based on the knowledge sources provided, I can guide you on how to clean up the
airbyte-storage/job-logging
directory after executing into the MinIO pod: 1. First, you need to exec into the MinIO pod. You can do this using the following command:
Copy code
kubectl exec -it -n airbyte airbyte-minio-0 -- /bin/bash
This command is derived from the [Github issue about S3 destination](https://github.com/airbytehq/airbyte/issues/21375). 2. Once you're inside the pod, navigate to the storage directory:
Copy code
cd /storage/airbyte-storage/job-logging
3. To clean up the directory, you can use standard Linux commands to delete files. For example, to delete all files older than 3 days, you could use:
Copy code
find . -type f -mtime +3 -delete
This command will find and delete all files that are more than 3 days old. However, it's important to note that manually cleaning up logs is not the recommended approach for long-term maintenance. As mentioned in a [forum post about cleaning up MinIO storage](https://discuss.airbyte.io/t/3101), a more sustainable solution would be to configure Airbyte to automatically manage log retention. For automatic cleanup, you should consider setting the
TEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable in your Airbyte configuration. This will cause Airbyte to automatically clean up old logs, reducing the need for manual intervention. Remember to be cautious when deleting files manually, as you might accidentally remove important data. Always ensure you have backups before performing any cleanup operations.
w
@kapa.ai
find
command is not available in minio pod.
k
A message from kapa.ai
I apologize for the confusion in my previous response. You're correct that the
find
command may not be available in the MinIO pod, as it typically runs a minimal container image. Based on the knowledge sources provided, here's a more appropriate way to clean up the
airbyte-storage/job-logging
directory after executing into the MinIO pod: 1. First, exec into the MinIO pod as mentioned before:
Copy code
kubectl exec -it -n airbyte airbyte-minio-0 -- /bin/bash
2. Once inside the pod, navigate to the storage directory:
Copy code
cd /storage/airbyte-storage/job-logging
3. Since MinIO uses a minimal container image, you'll need to use basic shell commands to clean up. You can use the
ls
command to list files and directories, and then use
rm
to remove them. For example:
Copy code
ls -l rm -rf *
This will list the contents of the directory and then remove all files and subdirectories. However, it's important to note that manually cleaning up logs like this is not recommended as a long-term solution. As mentioned in the [forum post about cleaning up MinIO storage](https://discuss.airbyte.io/t/3101), a better approach is to configure Airbyte to automatically manage log retention. For a more sustainable solution, consider setting the
TEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable in your Airbyte configuration. This will cause Airbyte to automatically clean up old logs, reducing the need for manual intervention. Remember to be extremely cautious when manually deleting files, as you might accidentally remove important data. Always ensure you have backups before performing any cleanup operations.
w
@kapa.ai If I have setup
TEMPORAL_HISTORY_RETENTION_IN_DAYS
env variable within temporal section of global values.yaml How does minio pod will clean up older files/logs? Do I need to restart the statefulset of minio? Or delete the statefulset and PVC and recreate it? How does this change take effect if I just set
TEMPORAL_HISTORY_RETENTION_IN_DAYS
and run
helm upgrade --install ..
command?
k
A message from kapa.ai
Based on the knowledge sources provided, I can address your questions about setting up
TEMPORAL_HISTORY_RETENTION_IN_DAYS
and its effect on MinIO log cleanup: 1. How MinIO pod will clean up older files/logs: The
TEMPORAL_HISTORY_RETENTION_IN_DAYS
variable is not directly related to MinIO's operation. Instead, it's used by Airbyte's internal processes to manage log retention. Airbyte will use this setting to determine which logs should be deleted, and it will handle the cleanup process internally. 2. Restarting MinIO or deleting StatefulSet and PVC: You don't need to restart the MinIO StatefulSet or delete and recreate the PVC. The log retention setting is handled by Airbyte's internal processes, not by MinIO directly. 3. How the change takes effect: When you set
TEMPORAL_HISTORY_RETENTION_IN_DAYS
in your
values.yaml
file and run the
helm upgrade --install
command, the new configuration will be applied to the Airbyte deployment. According to the [documentation on configuring Airbyte](https://docs.airbyte.com/operator-guides/configuring-airbyte#data-retention), this variable defines the retention period of the job history in Temporal, which defaults to 30 days if not set. The change should take effect after the Helm upgrade is complete and the Airbyte components have restarted with the new configuration. Airbyte's internal processes will then use this setting to manage log retention going forward. It's important to note that this setting won't immediately delete old logs. Instead, it will prevent new logs from being retained beyond the specified period. The cleanup process will happen gradually as part of Airbyte's regular operations. If you want to verify that the setting has been applied correctly, you can check the environment variables of the relevant Airbyte pods after the upgrade. You should see the
TEMPORAL_HISTORY_RETENTION_IN_DAYS
variable set to your specified value. Remember that while this setting helps manage log retention, it's still a good practice to monitor your MinIO storage usage over time to ensure it aligns with your expectations and storage capacity.
w
@kapa.ai How to create only minio pod with fresh PVC as I don;t want to touch other Airbyte running components on k8s.
k
A message from kapa.ai
Based on the knowledge sources provided, I don't have specific information on how to create only the MinIO pod with a fresh PVC without affecting other Airbyte components in Kubernetes. The available information doesn't cover this particular scenario. However, I can provide some general guidance based on Kubernetes best practices: 1. You would typically need to modify the Helm chart or create a separate Helm chart just for MinIO deployment. 2. You'd need to create a new PersistentVolumeClaim (PVC) for MinIO. 3. You'd then need to update the MinIO StatefulSet to use this new PVC. 4. Finally, you'd need to update any references in other Airbyte components to point to this new MinIO instance. Given the complexity of this operation and the potential for disrupting your existing Airbyte deployment, I would strongly recommend: 1. Backing up all your data before attempting any changes. 2. Testing this process in a non-production environment first. 3. Consulting with Airbyte's official documentation or support channels for the most up-to-date and safe method to achieve this. If you need specific steps or commands to perform this operation, I'm afraid I don't have enough information in the provided knowledge sources to give you accurate instructions. It would be best to seek advice from Airbyte's official support channels or documentation for such a specific operational task.
w
@kapa.ai Can I delete airbyte-minio pod directly using kubectl so that Statefulset can launch a new pod?
k
A message from kapa.ai
Based on the knowledge sources provided, I can offer some insights on deleting the airbyte-minio pod: Yes, you can delete the airbyte-minio pod directly using kubectl, and the StatefulSet will launch a new pod to replace it. This is because StatefulSets in Kubernetes are designed to maintain a stable set of pod identities and persistent storage. However, there are some important considerations: 1. Deleting the pod will not delete the underlying Persistent Volume Claim (PVC) associated with it. As mentioned in a [Github issue about helm delete behavior](https://github.com/airbytehq/airbyte/issues/18591), when you delete pods, the PVCs remain:
Copy code
$ kubectl delete -l <http://app.kubernetes.io/instance=airbyte|app.kubernetes.io/instance=airbyte> pods,statefulset,configmap,secret,service    pod "airbyte-minio-0" deleted    statefulset.apps "airbyte-minio" deleted
2. When the new pod is created by the StatefulSet, it will reattach to the existing PVC. This means that the data stored in MinIO will persist across pod deletions. 3. If your goal is to clean up the data stored in MinIO, simply deleting the pod won't achieve this. You would need to delete the PVC as well, but be cautious as this will permanently delete all stored data. 4. Deleting the MinIO pod might cause temporary disruption to Airbyte operations that depend on MinIO for log storage. If you decide to proceed with deleting the pod, you can use a command like this:
Copy code
kubectl delete pod airbyte-minio-0 -n <your-namespace>
Replace
<your-namespace>
with the actual namespace where Airbyte is deployed. Remember, this action will cause a brief period of unavailability for MinIO storage. The StatefulSet will create a new pod, but there might be a short delay before it's fully operational. Always ensure you have backups of important data before performing operations that could potentially lead to data loss.