Ethan Stuart
10/25/2024, 10:04 PMuser
10/25/2024, 10:04 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
parameter to help clean up the temporal database [GitHub Issue](https://github.com/airbytehq/airbyte/issues/39358). 3. Manual Cleanup: For the Airbyte database (db-airbyte), there isn't an automatic cleanup process. Users have reported manually deleting logs folders to free up space [Forum Post](https://discuss.airbyte.io/t/3303). 4. Temporary Tables: It's okay to delete _AIRBYTE_TMP
tables. These should ideally be deleted after a sync is completed, but if they persist, you can safely remove them [Forum Post](https://discuss.airbyte.io/t/1388). 5. Clearing Data: Airbyte provides a "Clear data" feature that allows you to remove all previously synced data from the destination for a connection. This can be done for all streams or a single stream [Clearing your data](https://docs.airbyte.com/operator-guides/clear#steps-to-clear-data). 6. Orphaned Tables: If you have any orphaned tables or files that are no longer being synced to, they should be cleaned up separately, as Airbyte will not clean them up automatically [Clear behavior](https://docs.airbyte.com/operator-guides/clear#clear-behavior). It's important to note that there isn't a built-in automatic cleanup process for all aspects of the Airbyte database. Regular monitoring and manual intervention may be necessary to manage database growth effectively.Ethan Stuart
10/28/2024, 4:11 PMuser
10/28/2024, 4:11 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable to control how long Airbyte keeps logs. For example, setting it to a lower value like 7 days instead of the default 30 days can help reduce storage usage [Airbyte Rotation Logs](https://discuss.airbyte.io/t/332). 2. Clear data for specific streams or connections: Airbyte allows you to clear data for individual streams or entire connections. This can be done through the UI or API. To clear data for a single stream, navigate to the Connection's status page, click the three grey dots next to the stream, and select "Clear data". For clearing all streams in a connection, go to the connection's Settings tab and click "Clear data" [Clearing your data](https://docs.airbyte.com/operator-guides/clear#steps-to-clear-data). 3. Clean up Minio storage (if applicable): If you're using Kubernetes, Minio stores logs that can accumulate over time. You can clean these up by executing into the Minio pod and deleting files: kubectl exec -it -n airbyte airbyte-minio-0 -- /bin/bash
Then remove files in /storage/airbyte-dev-logs/job-logging/workspace/
[How to clean up airbyte-minio and what's stored in minio?](https://discuss.airbyte.io/t/3101). 4. Adjust workspace retention settings: Although this feature might have been removed in newer versions, you can check if your version still supports these environment variables: - MINIMUM_WORKSPACE_RETENTION_DAYS
- MAXIMUM_WORKSPACE_RETENTION_DAYS
- MAXIMUM_WORKSPACE_SIZE_MB
These can help manage workspace data retention [Airbyte Rotation Logs](https://discuss.airbyte.io/t/332). 5. Regular database maintenance: If you're using Postgres as a destination, remember that it doesn't automatically free up disk space. Consider running VACUUM operations regularly to reclaim space [Github Issue Comment](https://github.com/airbytehq/airbyte/issues/36095#issuecomment-1499330181). Remember to be cautious when clearing data, as these operations cannot be undone. Always ensure you have necessary backups before performing any data removal operations.Ethan Stuart
10/28/2024, 4:13 PMEthan Stuart
10/28/2024, 4:13 PMuser
10/28/2024, 4:13 PMuser
10/28/2024, 4:14 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
environment variable to a lower value. This can significantly reduce storage usage over time. According to the [Airbyte forum post](https://discuss.airbyte.io/t/2417), you can change this variable to control how long Airbyte keeps logs: TEMPORAL_HISTORY_RETENTION_IN_DAYS=7
This will keep logs for 7 days instead of the default, which should help manage your storage usage. 2. Clear data for specific streams or connections: Regularly review and clear data for streams or connections that you no longer need. This can be done through the Airbyte UI as mentioned in the previous answer. 3. Manage disk space: According to the [Airbyte documentation on scaling](https://docs.airbyte.com/operator-guides/scaling-airbyte#disk-space), allocate a minimum of 30GBs of disk space per node. Since you're running on EC2, you have the flexibility to increase your EBS volume size if needed. However, be aware that long-running syncs can produce a fair amount of logs, which can consume disk space. 4. Consider external storage: If you're consistently running into storage issues, you might want to consider using external storage for state and logging information. Airbyte recommends using an object storage solution such as S3 for this purpose, as mentioned in the [Airbyte documentation on state and logging storage](https://docs.airbyte.com/deploying-airbyte/integrations/storage). This can help offload some of the storage requirements from your EC2 instance. 5. Regular maintenance: Periodically check the /var/lib/docker/volumes/airbyte_workspace/_data
directory, as this seems to be a common source of growing storage usage according to the [forum post](https://discuss.airbyte.io/t/2417). You might need to manually clean up this directory if it grows too large. Remember to always backup your data before performing any cleanup operations. Also, monitor your EC2 instance's storage usage regularly to prevent unexpected outages due to full disks.Ethan Stuart
10/28/2024, 4:27 PMuser
10/28/2024, 4:27 PMuser
10/28/2024, 4:28 PMTEMPORAL_HISTORY_RETENTION_IN_DAYS
variable to manage how long logs are kept [Forum Post: Quickly run out of storage](https://discuss.airbyte.io/t/2417). This helps in automatically managing log storage. 2. Workspace cleanup: Airbyte schedules periodic jobs to delete logs from syncs. By default, these logs are deleted every month or when their size exceeds 5GB [Airbyte Rotation Logs](https://discuss.airbyte.io/t/332). However, it's worth noting that this information might be outdated, as the configuration mentioned in that post was removed with the Scheduler. 3. Current approach: As of version 0.40.5, Airbyte cleans up files from the workspace by default every 30 days [Airbyte log sweeping](https://discuss.airbyte.io/t/2714). This is configurable, so you can set it to retain logs for a shorter period by setting the environment variable: TEMPORAL_HISTORY_RETENTION_IN_DAYS=1
While these features help manage storage to some extent, they may not completely solve all storage issues automatically. It's still recommended to monitor your storage usage and implement additional measures if needed, especially for long-running syncs or deployments with multiple connectors. For the most up-to-date information on storage management features in Airbyte 1.0+, I recommend checking the latest official Airbyte documentation or reaching out to Airbyte support directly.Ethan Stuart
10/28/2024, 4:50 PMuser
10/28/2024, 4:50 PMuser
10/28/2024, 4:51 PMairbyte-db
. Here's how you can do it: 1. First, ensure you're using the default airbyte-db
Postgres database. If you've set up an external database, these instructions may not apply. 2. The default credentials for the database are specified in the .env
file used to run Airbyte. By default, they are: DATABASE_USER=docker DATABASE_PASSWORD=docker DATABASE_DB=airbyte
If you've changed these defaults, you'll need to use your custom values instead. 3. To access the database using psql
, run the following command in your terminal: docker exec -ti airbyte-db psql -U docker -d airbyte
This command connects you to the Postgres database inside the airbyte-db
container [Accessing the default database located in docker airbyte-db](https://docs.airbyte.com/operator-guides/configuring-airbyte-db#accessing-the-default-database-located-in-docker-airbyte-db). 4. Once connected, you can run SQL queries against the various tables in the Airbyte database. Some of the important tables include: - workspace
- actor_definition
- actor
- actor_oauth_parameter
- operation
- connection
- connection_operation
- state
5. To see all available tables, you can use the \d
command in psql. 6. To view the structure of a specific table, use \d table_name
, for example: \d actor
Remember that this database contains important configuration and operational data for Airbyte. Be cautious when making any changes, and it's recommended to only perform read operations unless you're absolutely sure about what you're doing. Also, note that no actual data from your source or destination connectors is stored in this internal database [Documentation > Operator-guides > Configuring the Airbyte Database](https://docs.airbyte.com/operator-guides/configuring-airbyte-db).Ethan Stuart
10/28/2024, 4:52 PMuser
10/28/2024, 4:52 PMuser
10/28/2024, 4:52 PMairbyte-db
container. This error suggests that the Docker daemon cannot find a container named airbyte-db
. There could be a few reasons for this: 1. Airbyte is not currently running: Make sure you've started Airbyte using the appropriate command (usually docker-compose up
or docker compose up
, depending on your Docker version). 2. The container name has changed: In some Airbyte versions or custom setups, the container name might be different. 3. You're using a different deployment method: If you're using Kubernetes or a cloud-managed version of Airbyte, the approach to accessing the database would be different. To troubleshoot this issue: 1. First, check if Airbyte is running. You can list all running Docker containers with: docker ps
Look for containers with names starting with airbyte_
. 2. If you don't see any Airbyte containers, try starting Airbyte. Navigate to your Airbyte directory and run: docker-compose up -d
3. If Airbyte is running but you still don't see the airbyte-db
container, check the container names. The database container might have a slightly different name. Look for a container with "db" in its name. 4. If you're using a newer version of Airbyte, the container names might have changed. You can try using the container ID instead of the name. First, list all containers (including stopped ones): docker ps -a
Then, look for the container ID of the database container and use it in place of airbyte-db
in the command. If you're still having issues, it might be helpful to check your Airbyte version and deployment method, as the exact steps can vary. Also, ensure that you're running the command from the same environment where Airbyte is deployed. Remember, accessing the internal database directly should be done cautiously, as mentioned in the [Airbyte documentation](https://docs.airbyte.com/operator-guides/configuring-airbyte-db#accessing-the-default-database-located-in-docker-airbyte-db).Ethan Stuart
10/28/2024, 4:54 PMairbyte-abctl-control-plane
user
10/28/2024, 4:54 PMuser
10/28/2024, 4:54 PMabctl
tool to manage Airbyte, which uses a different deployment method than the standard Docker Compose setup. This explains why you're only seeing one container named airbyte-abctl-control-plane
. The abctl
tool uses Kubernetes to deploy Airbyte, even when running locally. This means that instead of individual Docker containers, Airbyte components are running as Kubernetes pods within a local Kubernetes cluster (likely managed by Kind). To interact with this setup: 1. First, ensure you have kubectl
installed and configured to work with the local Kubernetes cluster created by abctl
. 2. You can list the Airbyte pods by running: kubectl get pods -n airbyte-abctl
This should show you the various Airbyte components running as pods [Documentation > Contributing-to-airbyte > Developing Locally > Prerequisites > Using abctl for Airbyte development Part 2](https://docs.airbyte.com/contributing-to-airbyte/developing-locally#using-abctl-for-airbyte-development). 3. To access the database in this setup, you'll need to find the database pod. It's likely named something like airbyte-db-0
. 4. To connect to the database, you can use: kubectl exec -it -n airbyte-abctl airbyte-db-0 -- psql -U airbyte
This command might need to be adjusted based on the exact pod name and database user. 5. If you need to view logs for a particular component, you can use: kubectl logs -n airbyte-abctl <pod-name>
For example: kubectl logs -n airbyte-abctl airbyte-abctl-server-74465db7fd-gk25q
Remember, when using abctl
, you're working with a Kubernetes-based deployment, so the standard Docker commands won't work directly. You'll need to use Kubernetes commands (kubectl
) to interact with the Airbyte components. If you're having issues with the abctl
installation, you might want to check the [Github issue about abctl local install failing](https://github.com/airbytehq/airbyte/issues/44931), which discusses some common problems and solutions when using abctl
.Ethan Stuart
10/28/2024, 5:13 PMuser
10/28/2024, 5:13 PMuser
10/28/2024, 5:13 PMabctl
to manage Airbyte locally. Let's break down the problem and potential solutions: 1. The error suggests that the Kubernetes API server is not accessible at localhost:8080
. This could be because: - The local Kubernetes cluster (likely managed by Kind) is not running - The API server is not exposed on the expected port - There's a network issue preventing the connection 2. This error is similar to the one reported in the [Github issue about abctl local install failing](https://github.com/airbytehq/airbyte/issues/40231), which indicates it's a known issue with the abctl
tool. To resolve this issue, you can try the following steps: 1. Ensure that the local Kubernetes cluster is running. You can check this by running: kind get clusters
If you don't see a cluster named airbyte-abctl
, you may need to create it. 2. If the cluster exists, try to get the correct API server address and port: kubectl cluster-info
This should show you the correct address and port for the Kubernetes API server. 3. Check if the abctl
kubeconfig file is correctly configured. It should be located at ~/.airbyte/abctl/abctl.kubeconfig
. You can try setting it explicitly: export KUBECONFIG=~/.airbyte/abctl/abctl.kubeconfig
4. If the issue persists, you might need to recreate the local Kubernetes cluster. You can do this by running: abctl local uninstall abctl local install
5. If you're still encountering issues, it might be worth checking your firewall settings to ensure it's not blocking the connection to localhost:8080. Remember, abctl
is using a local Kubernetes setup, so troubleshooting steps for standard Docker-based Airbyte installations may not apply here. If these steps don't resolve the issue, you might want to consider reporting it as a bug to the Airbyte GitHub repository, as it seems to be a recurring issue with the abctl
tool. [Documentation > Deploying-airbyte > Troubleshooting abctl > Common Errors > Connection Refused](https://docs.airbyte.com/deploying-airbyte/troubleshoot-deploy#connection-refused) provides similar error messages and context, confirming that this is a known issue with abctl
.Ethan Stuart
10/28/2024, 8:45 PMuser
10/28/2024, 8:45 PMuser
10/28/2024, 8:46 PMdocker-compose.yml
file as airbyte/db
[Documentation > Operator-guides > Configuring the Airbyte Database](https://docs.airbyte.com/operator-guides/configuring-airbyte-db). I don't have specific information about a "templates" database in the provided knowledge sources. To get more detailed or up-to-date information about these databases and their contents, I recommend checking the latest Airbyte documentation or reaching out to Airbyte support directly.