Jake Duckworth
10/26/2024, 2:32 PMincremental_deduped_history
) sync mode_._
When the sync runs, it does a full refresh
every time instead of processing the incremental changes. Do I have something misconfigured? Any idea why the incremental processing is not working?
Here is my current setup:
Airbyte version: 1.1.0
Postgres Source Version: 3.6.22
Postgres Destination Version: 2.4.0
Postgres config:
CREATE ROLE airbyte_role;
ALTER ROLE airbyte_role REPLICATION LOGIN;
GRANT SELECT ON TABLE public.table_1 TO airbyte_role;
ALTER TABLE public.table_1 REPLICA IDENTITY DEFAULT;
CREATE PUBLICATION airbyte_publication FOR TABLES IN SCHEMA public;
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
user
10/26/2024, 2:32 PMpg_wal
directory. Postgres then purges or archives the older WAL files. When this occurs, you'll see a log message like: > Saved offset is before Replication slot's confirmed_flush_lsn, Airbyte will trigger sync from scratch To prevent this, you can: - Sync the data source more frequently - Increase the wal_keep_size
setting in your Postgres configuration [Source](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs) 2. Connector version: Ensure you're using the latest version of the Postgres source connector. There have been issues with CDC state messages in older versions that could cause this behavior. 3. Primary key requirement: Incremental sync is only supported for tables with primary keys. Ensure all your tables have primary keys defined. 4. Replication slot configuration: Make sure your replication slot is correctly configured and not being used by multiple destinations. An Airbyte database source using CDC replication can only be used with a single Airbyte destination. To troubleshoot and potentially resolve the issue: 1. Check your Postgres logs for any messages related to WAL retention or replication slots. 2. Increase the frequency of your syncs to prevent WAL file removal. 3. Consider increasing the wal_keep_size
in your Postgres configuration. 4. Verify that all tables being synced have primary keys. 5. Ensure you're using the latest version of the Postgres source connector. 6. If the issue persists, you may need to reset the connection and start a new sync from scratch. For more detailed troubleshooting steps and requirements for CDC, refer to the [Airbyte Postgres CDC documentation](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements).Jake Duckworth
10/26/2024, 7:17 PMwal_keep_size
and dropped / recreated the replication slot and Airbyte source, destination, and connection. Now the sync runs significantly faster (went form 20 min down to 1 min). The sync is still reloading all records in the table however. There are 57 rows in the source table and the table size is 32K. Is there a threshold where airbyte will always do a full refresh if the size of the table or number of records is below a certain value?user
10/26/2024, 7:17 PMJake Duckworth
10/26/2024, 8:11 PMrestart_lsn
is not advancing. What could be the cause of this?user
10/26/2024, 8:12 PMuser
10/26/2024, 8:12 PMrestart_lsn
(Log Sequence Number) might not be advancing in your Postgres replication slot used by Airbyte. Here are some possible causes and suggestions: 1. Inactive Replication Slot: The replication slot might become inactive if the consumer (in this case, Airbyte) disconnects. You can check the status of your replication slot using the following query: sql SELECT SLOT_NAME, PG_CURRENT_WAL_LSN(), REDO_LSN, RESTART_LSN, PG_SIZE_PRETTY(PG_WAL_LSN_DIFF(PG_CURRENT_WAL_LSN(), RESTART_LSN)) AS LAG, ACTIVE FROM PG_CONTROL_CHECKPOINT(), PG_REPLICATION_SLOTS;
If the ACTIVE
column shows false
, it means the slot is inactive [Github Issue #42917](https://github.com/airbytehq/airbyte/issues/42917). 2. Filtered Events: There might be a large number of events being filtered out, preventing the offset from being committed. This can cause the replication slot to not acknowledge the processed WAL offsets. A warning in the logs might look like this: Received 10001 events which were all filtered out, so no offset could be committed. This prevents the replication slot from acknowledging the processed WAL offsets, causing a growing backlog of non-removeable WAL segments on the database server.
To address this, you might need to adjust your filter configuration or enable heartbeat events [Airbyte Forum](https://discuss.airbyte.io/t/2675). 3. WAL Purging: Your Postgres database might be purging WAL (Write-Ahead Logging) files between Airbyte jobs. To prevent this, try running Airbyte sync jobs more frequently and extend the auto-cancellation timeout [Airbyte Forum](https://discuss.airbyte.io/t/2675). 4. Incompatibility Issues: There have been known compatibility issues with certain versions of the Postgres source connector and Airbyte. Ensure you're using the latest version of both Airbyte and the Postgres source connector [Airbyte Forum](https://discuss.airbyte.io/t/1791). 5. Debezium Configuration: The issue might be related to Debezium, which Airbyte uses for CDC. Ensure you're using a version of the Postgres connector that includes the latest Debezium fixes [Airbyte Forum](https://discuss.airbyte.io/t/2675). If none of these solutions work, you might need to reset the connection and potentially increase WAL retention and/or sync frequency. You can find more troubleshooting information in the [Postgres Troubleshooting Guide](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs).Nanagoud
10/28/2024, 12:54 PMJake Duckworth
10/28/2024, 1:55 PMNanagoud
10/28/2024, 2:01 PMJake Duckworth
10/28/2024, 2:12 PMpublic.table_1
should be incremental. There are others in these logs that are set to full.Jake Duckworth
10/28/2024, 7:22 PMtable_1
to the database role I am using for the connection.
table_1
- There are 6255 records in the table
- The table size is 2Mb
results
- The replication slot is now active
when the sync runs.
- The restart_lsn
is now advancing as expected.
- The sync is still doing a FULL OVERWRITE
of the data instead of an incremental append as expected.
I attached the logs from my latest run. They should be cleaner then the earlier log I posted.