I have a connection set up between 2 postgres data...
# ask-ai
j
I have a connection set up between 2 postgres databases. The source is configured to use CDC replication. The tables in the stream are configured to use the Incremental | Append + Deduped (
incremental_deduped_history
)
sync mode_._ When the sync runs, it does a
full refresh
every time instead of processing the incremental changes. Do I have something misconfigured? Any idea why the incremental processing is not working? Here is my current setup: Airbyte version: 1.1.0 Postgres Source Version: 3.6.22 Postgres Destination Version: 2.4.0 Postgres config:
Copy code
CREATE ROLE airbyte_role;
ALTER ROLE airbyte_role REPLICATION LOGIN;

GRANT SELECT ON TABLE public.table_1 TO airbyte_role;
ALTER TABLE public.table_1 REPLICA IDENTITY DEFAULT;
 
CREATE PUBLICATION airbyte_publication FOR TABLES IN SCHEMA public;
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
u
Based on the information provided in the knowledge sources, there are a few potential reasons why your CDC incremental sync is performing full refreshes instead of processing incremental changes: 1. WAL retention issue: The most likely cause is that the Write-Ahead Logs (WAL) needed for the incremental sync have been removed by Postgres. This can happen when there are many database updates, resulting in more WAL files than allowed in the
pg_wal
directory. Postgres then purges or archives the older WAL files. When this occurs, you'll see a log message like: > Saved offset is before Replication slot's confirmed_flush_lsn, Airbyte will trigger sync from scratch To prevent this, you can: - Sync the data source more frequently - Increase the
wal_keep_size
setting in your Postgres configuration [Source](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs) 2. Connector version: Ensure you're using the latest version of the Postgres source connector. There have been issues with CDC state messages in older versions that could cause this behavior. 3. Primary key requirement: Incremental sync is only supported for tables with primary keys. Ensure all your tables have primary keys defined. 4. Replication slot configuration: Make sure your replication slot is correctly configured and not being used by multiple destinations. An Airbyte database source using CDC replication can only be used with a single Airbyte destination. To troubleshoot and potentially resolve the issue: 1. Check your Postgres logs for any messages related to WAL retention or replication slots. 2. Increase the frequency of your syncs to prevent WAL file removal. 3. Consider increasing the
wal_keep_size
in your Postgres configuration. 4. Verify that all tables being synced have primary keys. 5. Ensure you're using the latest version of the Postgres source connector. 6. If the issue persists, you may need to reset the connection and start a new sync from scratch. For more detailed troubleshooting steps and requirements for CDC, refer to the [Airbyte Postgres CDC documentation](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements).
j
@kapa.ai I adjusted the
wal_keep_size
and dropped / recreated the replication slot and Airbyte source, destination, and connection. Now the sync runs significantly faster (went form 20 min down to 1 min). The sync is still reloading all records in the table however. There are 57 rows in the source table and the table size is 32K. Is there a threshold where airbyte will always do a full refresh if the size of the table or number of records is below a certain value?
u
Based on the information provided in the knowledge sources, there doesn't appear to be a specific threshold where Airbyte will always perform a full refresh based on table size or number of records. However, I can provide some relevant information about full refresh synchronization in Airbyte: 1. Full refresh synchronization always retrieves all available data requested from the source and writes it to the destination, regardless of whether the data has been previously synced or not [Airbyte Tutorials](https://airbyte.com/tutorials/full-data-synchronization#step-2). 2. There are two variants of full refresh synchronization: - Full refresh | overwrite: This mode overwrites all data in the destination table with a new copy of the source data. - Full refresh | append: This mode appends the entire data set onto the data from previous syncs [Airbyte Tutorials](https://airbyte.com/tutorials/full-data-synchronization#step-2). 3. Full refresh synchronization is generally recommended for smaller datasets. For large amounts of data or frequent synchronizations, other replication modes like incremental sync might be more appropriate [Airbyte Tutorials](https://airbyte.com/tutorials/full-data-synchronization#step-2). 4. The sync mode you're using can affect how data is replicated. Make sure you're using the appropriate sync mode for your needs [Airbyte Forum](https://discuss.airbyte.io/t/1579). Given that your table has only 57 rows and is 32K in size, it's possible that Airbyte is defaulting to a full refresh because the dataset is relatively small. However, without more specific information about your Airbyte configuration and the sync mode you're using, it's difficult to say definitively why it's performing a full refresh each time. If you want to avoid full refreshes for this small table, you might want to explore using an incremental sync mode if it's supported for your source and destination. You can check the documentation for your specific source to see what sync modes are supported [Airbyte Docs](https://docs.airbyte.com/integrations/sources/adjust#supported-sync-modes).
j
@kapa.ai I verified the replication slot is correct and verified in the logs that Airbyte can find it, but the
restart_lsn
is not advancing. What could be the cause of this?
u
A message from kapa.ai
u
Based on the information provided in the knowledge sources, there are a few potential reasons why the
restart_lsn
(Log Sequence Number) might not be advancing in your Postgres replication slot used by Airbyte. Here are some possible causes and suggestions: 1. Inactive Replication Slot: The replication slot might become inactive if the consumer (in this case, Airbyte) disconnects. You can check the status of your replication slot using the following query:
Copy code
sql    SELECT SLOT_NAME, PG_CURRENT_WAL_LSN(), REDO_LSN, RESTART_LSN, PG_SIZE_PRETTY(PG_WAL_LSN_DIFF(PG_CURRENT_WAL_LSN(), RESTART_LSN)) AS LAG, ACTIVE    FROM PG_CONTROL_CHECKPOINT(), PG_REPLICATION_SLOTS;
If the
ACTIVE
column shows
false
, it means the slot is inactive [Github Issue #42917](https://github.com/airbytehq/airbyte/issues/42917). 2. Filtered Events: There might be a large number of events being filtered out, preventing the offset from being committed. This can cause the replication slot to not acknowledge the processed WAL offsets. A warning in the logs might look like this:
Copy code
Received 10001 events which were all filtered out, so no offset could be committed. This prevents the replication slot from acknowledging the processed WAL offsets, causing a growing backlog of non-removeable WAL segments on the database server.
To address this, you might need to adjust your filter configuration or enable heartbeat events [Airbyte Forum](https://discuss.airbyte.io/t/2675). 3. WAL Purging: Your Postgres database might be purging WAL (Write-Ahead Logging) files between Airbyte jobs. To prevent this, try running Airbyte sync jobs more frequently and extend the auto-cancellation timeout [Airbyte Forum](https://discuss.airbyte.io/t/2675). 4. Incompatibility Issues: There have been known compatibility issues with certain versions of the Postgres source connector and Airbyte. Ensure you're using the latest version of both Airbyte and the Postgres source connector [Airbyte Forum](https://discuss.airbyte.io/t/1791). 5. Debezium Configuration: The issue might be related to Debezium, which Airbyte uses for CDC. Ensure you're using a version of the Postgres connector that includes the latest Debezium fixes [Airbyte Forum](https://discuss.airbyte.io/t/2675). If none of these solutions work, you might need to reset the connection and potentially increase WAL retention and/or sync frequency. You can find more troubleshooting information in the [Postgres Troubleshooting Guide](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs).
n
hi even i have encountered the same issue did you solved it i have created issue https://github.com/airbytehq/airbyte/issues/47547 please if u found the issue help me also that would be much appreciated
j
@Nanagoud Thanks for creating the ticket for this issue. I have yet to find a solution to the problem. I added some additional context from my troubleshooting to the ticket. Hopefully we can some help with this issue. It is a pretty big problem if incremental replication doesn't work.
n
@Jake Duckworth Yes indeed correct i am working on deadline here not single thing helped
j
Here are my logs. Note that
public.table_1
should be incremental. There are others in these logs that are set to full.
I rebuilt my Airbyte installation, source, destination, and connection. I dropped and recreated my PG publication and replication slot. I granted access to 1 table
table_1
to the database role I am using for the connection. table_1 - There are 6255 records in the table - The table size is 2Mb results - The replication slot is now
active
when the sync runs. - The
restart_lsn
is now advancing as expected. - The sync is still doing a
FULL OVERWRITE
of the data instead of an incremental append as expected. I attached the logs from my latest run. They should be cleaner then the earlier log I posted.