Eka Pramudita
09/26/2024, 9:04 AMuser
09/26/2024, 9:04 AMairbyte_internal
dataset in BigQuery is used by Airbyte to store raw data from your data sources. This dataset can grow in size depending on the volume of data you are syncing. Here are a few strategies to manage the size of this dataset: 1. Optimize your sync frequency and data selection: If you're syncing large volumes of data frequently, consider reducing the sync frequency or selecting only the necessary data for your use case. This can help reduce the amount of data stored in the airbyte_internal
dataset. 2. Manage temporary tables: Airbyte creates temporary tables in the airbyte_internal
dataset during the sync process. These tables should be deleted after the sync is completed. However, in some cases, these tables may not be deleted, leading to increased storage usage. There have been [issues reported](https://github.com/airbytehq/airbyte/issues/31660) regarding this, and Airbyte has made fixes in newer versions of the platform. Ensure you're using the latest version of Airbyte and the BigQuery connector. 3. Change the raw data dataset: By default, Airbyte uses the airbyte_internal
dataset to store raw data. However, this can be [overridden by the user](https://docs.airbyte.com/integrations/destinations/bigquery#output-schema). You can specify a different dataset for storing raw data to manage storage usage. 4. Delete unused data: If there are tables or data in the airbyte_internal
dataset that are no longer needed, consider deleting them to free up storage. Remember that any changes to the airbyte_internal
dataset should be done carefully, as it may affect your data sync processes. Always backup your data before making any changes.Eka Pramudita
09/26/2024, 9:17 AMuser
09/26/2024, 9:17 AMuser
09/26/2024, 9:17 AMairbyte_internal
dataset after sync completion has been reported in the past. The problem was that temporary tables were created during the sync process but were not removed after the sync was completed, leading to unnecessary storage usage. The Airbyte team has addressed this issue in newer versions of the platform. The fix ensures that all temporary tables in the airbyte_internal
dataset are deleted at the conclusion of a successful sync. To resolve this issue, you should: 1. Confirm that all your syncs are successful. The cleanup process is triggered after a successful sync. 2. Update your Airbyte platform and BigQuery connector to the latest version. The fix for this issue is included in the newer versions. You can check the version of your Airbyte platform and BigQuery connector in the Airbyte UI. To update, follow the [upgrade instructions](https://docs.airbyte.io/operator-guides/upgrading-airbyte) in the Airbyte documentation. If you're still facing this issue after updating, consider reporting it on the [Airbyte Github](https://github.com/airbytehq/airbyte/issues) page. For more details, refer to the [Github issue](https://github.com/airbytehq/airbyte/issues/31660) where this problem was discussed and resolved.Eka Pramudita
09/26/2024, 9:27 AMuser
09/26/2024, 9:28 AMuser
09/26/2024, 9:28 AMairbyte_internal
dataset could potentially cause issues. Airbyte uses the airbyte_internal
dataset to store raw data during the sync process. If you rename this dataset, Airbyte might not be able to find it and this could disrupt the sync process. However, Airbyte does allow you to specify a different dataset for storing raw data. This can be done in the destination settings of the BigQuery connector. If you've renamed the airbyte_internal
dataset, ensure that you've updated the destination settings in Airbyte to reflect this change. Here's how you can do it: 1. In Airbyte, navigate to the destination settings. 2. Expand the optional fields. 3. In the Raw Table Schema Name
field, specify your custom dataset name. Remember to replace all references to airbyte_internal
with your custom dataset name in your configurations. If you're still facing issues after making these changes, consider reverting the dataset name back to airbyte_internal
and see if that resolves the issue. For more details, refer to the [Github issue](https://github.com/airbytehq/airbyte/issues/30101) where a similar problem was discussed and a workaround was provided.Eka Pramudita
09/26/2024, 10:16 AMuser
09/26/2024, 10:16 AMuser
09/26/2024, 10:16 AMairbyte_internal
dataset are still not being deleted after sync completion, it could be a bug. Here are a few steps you can take: 1. Check your sync logs: Look for any errors or warnings in your sync logs that might indicate why the tables are not being deleted. 2. Check your Airbyte and BigQuery connector versions: Ensure that both your Airbyte platform and BigQuery connector are up-to-date. The fix for this issue was implemented in the BigQuery connector version [2.3.24](https://github.com/airbytehq/airbyte/issues/31660#issuecomment-1001705017). 3. Report the issue: If you're still facing this issue, consider reporting it on the [Airbyte Github](https://github.com/airbytehq/airbyte/issues) page. Include as much detail as possible, such as your Airbyte and BigQuery connector versions, sync logs, and a description of the issue. Remember, any changes to the airbyte_internal
dataset should be done carefully, as it may affect your data sync processes. Always backup your data before making any changes.