<@U04SE163WC8> how to reduce airbyte_internal data...
# ask-ai
e
@kapa.ai how to reduce airbyte_internal dataset size in bigquery
u
The
airbyte_internal
dataset in BigQuery is used by Airbyte to store raw data from your data sources. This dataset can grow in size depending on the volume of data you are syncing. Here are a few strategies to manage the size of this dataset: 1. Optimize your sync frequency and data selection: If you're syncing large volumes of data frequently, consider reducing the sync frequency or selecting only the necessary data for your use case. This can help reduce the amount of data stored in the
airbyte_internal
dataset. 2. Manage temporary tables: Airbyte creates temporary tables in the
airbyte_internal
dataset during the sync process. These tables should be deleted after the sync is completed. However, in some cases, these tables may not be deleted, leading to increased storage usage. There have been [issues reported](https://github.com/airbytehq/airbyte/issues/31660) regarding this, and Airbyte has made fixes in newer versions of the platform. Ensure you're using the latest version of Airbyte and the BigQuery connector. 3. Change the raw data dataset: By default, Airbyte uses the
airbyte_internal
dataset to store raw data. However, this can be [overridden by the user](https://docs.airbyte.com/integrations/destinations/bigquery#output-schema). You can specify a different dataset for storing raw data to manage storage usage. 4. Delete unused data: If there are tables or data in the
airbyte_internal
dataset that are no longer needed, consider deleting them to free up storage. Remember that any changes to the
airbyte_internal
dataset should be done carefully, as it may affect your data sync processes. Always backup your data before making any changes.
e
@kapa.ai tables in airbyte_internal are not deleted after sync completed. using bigquery as destination connector
u
A message from kapa.ai
u
This issue of temporary tables not being deleted in the
airbyte_internal
dataset after sync completion has been reported in the past. The problem was that temporary tables were created during the sync process but were not removed after the sync was completed, leading to unnecessary storage usage. The Airbyte team has addressed this issue in newer versions of the platform. The fix ensures that all temporary tables in the
airbyte_internal
dataset are deleted at the conclusion of a successful sync. To resolve this issue, you should: 1. Confirm that all your syncs are successful. The cleanup process is triggered after a successful sync. 2. Update your Airbyte platform and BigQuery connector to the latest version. The fix for this issue is included in the newer versions. You can check the version of your Airbyte platform and BigQuery connector in the Airbyte UI. To update, follow the [upgrade instructions](https://docs.airbyte.io/operator-guides/upgrading-airbyte) in the Airbyte documentation. If you're still facing this issue after updating, consider reporting it on the [Airbyte Github](https://github.com/airbytehq/airbyte/issues) page. For more details, refer to the [Github issue](https://github.com/airbytehq/airbyte/issues/31660) where this problem was discussed and resolved.
e
@kapa.ai the condition is I renamed the airbyte_internal to custom dataset name, is it the issue?
u
A message from kapa.ai
u
Yes, renaming the
airbyte_internal
dataset could potentially cause issues. Airbyte uses the
airbyte_internal
dataset to store raw data during the sync process. If you rename this dataset, Airbyte might not be able to find it and this could disrupt the sync process. However, Airbyte does allow you to specify a different dataset for storing raw data. This can be done in the destination settings of the BigQuery connector. If you've renamed the
airbyte_internal
dataset, ensure that you've updated the destination settings in Airbyte to reflect this change. Here's how you can do it: 1. In Airbyte, navigate to the destination settings. 2. Expand the optional fields. 3. In the
Raw Table Schema Name
field, specify your custom dataset name. Remember to replace all references to
airbyte_internal
with your custom dataset name in your configurations. If you're still facing issues after making these changes, consider reverting the dataset name back to
airbyte_internal
and see if that resolves the issue. For more details, refer to the [Github issue](https://github.com/airbytehq/airbyte/issues/30101) where a similar problem was discussed and a workaround was provided.
e
@kapa.ai I already using latest version of airbyte, but the tables inairbyte_internal dataset is not deleted
u
A message from kapa.ai
u
If you're using the latest version of Airbyte and the temporary tables in the
airbyte_internal
dataset are still not being deleted after sync completion, it could be a bug. Here are a few steps you can take: 1. Check your sync logs: Look for any errors or warnings in your sync logs that might indicate why the tables are not being deleted. 2. Check your Airbyte and BigQuery connector versions: Ensure that both your Airbyte platform and BigQuery connector are up-to-date. The fix for this issue was implemented in the BigQuery connector version [2.3.24](https://github.com/airbytehq/airbyte/issues/31660#issuecomment-1001705017). 3. Report the issue: If you're still facing this issue, consider reporting it on the [Airbyte Github](https://github.com/airbytehq/airbyte/issues) page. Include as much detail as possible, such as your Airbyte and BigQuery connector versions, sync logs, and a description of the issue. Remember, any changes to the
airbyte_internal
dataset should be done carefully, as it may affect your data sync processes. Always backup your data before making any changes.