Nanagoud
10/28/2024, 12:54 PMuser
10/28/2024, 12:55 PMNanagoud
10/28/2024, 1:06 PMuser
10/28/2024, 1:06 PMNormally under the CDC mode, the Postgres source will first run a full refresh sync to read the snapshot of all the existing data, and all subsequent runs will only be incremental syncs reading from the write-ahead logs (WAL). However, occasionally, you may see full refresh syncs after the initial run.When this happens, you might see the following log:
Saved offset is before Replication slot's confirmed_flush_lsn, Airbyte will trigger sync from scratch2. Specific logs or configurations to examine: To troubleshoot this issue, you should look for the log message mentioned above in your Airbyte logs. Additionally, you can check the following: a. WAL retention settings: Although you've set
max_slot_wal_keep_size
, max_wal_size
, and wal_keep_size
to 2GB, this might not be sufficient depending on your data volume and sync frequency. You may need to increase these values further.
b. Sync frequency: If your syncs are too infrequent, PostgreSQL might remove the necessary WAL files before Airbyte can process them. Try increasing the sync frequency.
c. Replication slot status: Check the status of your replication slot in PostgreSQL to ensure it's not falling behind.
d. CDC requirements: Ensure that your tables meet all the [CDC requirements](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements), such as having primary keys and not being views.
To resolve this issue, you can try the following:
1. Increase WAL retention: Set higher values for wal_keep_size
and max_slot_wal_keep_size
. The documentation suggests:
sql
ALTER SYSTEM SET wal_keep_size = '10GB';
2. Increase sync frequency: Schedule more frequent syncs to prevent WAL files from being removed before Airbyte can process them.
3. Monitor replication slot: Regularly check the status of your replication slot to ensure it's not falling behind.
4. Check table structure: Ensure all tables you're syncing have primary keys and meet other CDC requirements.
5. Consider enabling heartbeat events: As mentioned in a [forum post](https://discuss.airbyte.io/t/2675), enabling heartbeat events might help prevent WAL backlog issues:user
10/28/2024, 1:06 PMmax_slot_wal_keep_size
, max_wal_size
, and wal_keep_size
to 2GB, this might not be sufficient depending on your data volume and sync frequency. You may need to increase these values further. b. Sync frequency: If your syncs are too infrequent, PostgreSQL might remove the necessary WAL files before Airbyte can process them. Try increasing the sync frequency. c. Replication slot status: Check the status of your replication slot in PostgreSQL to ensure it's not falling behind. d. CDC requirements: Ensure that your tables meet all the [CDC requirements](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements), such as having primary keys and not being views. To resolve this issue, you can try the following: 1. Increase WAL retention: Set higher values for wal_keep_size
and max_slot_wal_keep_size
. The documentation suggests: sql ALTER SYSTEM SET wal_keep_size = '10GB';
2. Increase sync frequency: Schedule more frequent syncs to prevent WAL files from being removed before Airbyte can process them. 3. Monitor replication slot: Regularly check the status of your replication slot to ensure it's not falling behind. 4. Check table structure: Ensure all tables you're syncing have primary keys and meet other CDC requirements. 5. Consider enabling heartbeat events: As mentioned in a [forum post](https://discuss.airbyte.io/t/2675), enabling heartbeat events might help prevent WAL backlog issues: > Consider to either adjust your filter configuration or enable heartbeat events (via the heartbeat.interval.ms option) to avoid this situation. If these steps don't resolve the issue, you may need to reset the connection and perform a new initial sync. Also, ensure you're using the latest version of the Airbyte PostgreSQL source connector, as there have been performance improvements in recent versions.Nanagoud
10/28/2024, 1:26 PMuser
10/28/2024, 1:26 PMuser
10/28/2024, 1:27 PMsql CREATE TABLE airbyte_heartbeat ( id SERIAL PRIMARY KEY, text VARCHAR(255), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
2. Add this table to the Airbyte publication: sql ALTER PUBLICATION <your_publication_name> ADD TABLE airbyte_heartbeat;
3. When configuring your PostgreSQL source in Airbyte, you need to set the heartbeat_action_query
property. This query will be periodically executed by Airbyte on the airbyte_heartbeat
table. For example, you can set it to: sql INSERT INTO airbyte_heartbeat (text) VALUES ('heartbeat')
This setup is described in the [Airbyte documentation](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#advanced-wal-disk-consumption-and-heartbeat-action-query), which states: "A workaround for this situation is to artificially add events to a heartbeat table that the Airbyte use has write access to. This will ensure that Airbyte can process the WAL and prevent disk space to spike." The documentation provides these steps: 1. Create a table (e.g. airbyte_heartbeat
) in the database and schema being tracked. 2. Add this table to the airbyte publication. 3. Configure the heartbeat_action_query
property while setting up the source-postgres connector. This approach helps prevent issues with WAL disk consumption and ensures that Airbyte can process the WAL even when there are no changes to the tracked tables. It's particularly useful in situations where there are a large volume of changes, but only a small percentage of them are being made to the databases, schemas, and tables configured for capture. For more detailed information, you can refer to the [Debezium documentation](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-wal-disk-space) linked in the Airbyte docs.