<@U04S8PLJWMT> I have a postgres RDS source using ...
# ask-ai
s
@kapa.ai I have a postgres RDS source using CDC. I am trying to sync a table that is 2TB of data. It successfully syncs the table initially, but if fails to read any updates after that. Every following sync will update 0 records, even though there are 20 million updates per hour to that table. We confirmed the replication slot is receiving these events. With all the same settings, we can sync a smaller table with only 60 updates per hour. What can we try to help resolve this issue to successfully keep this table updated?
k
Based on the information provided in the knowledge sources, it appears that you're experiencing a common issue with large PostgreSQL databases using CDC (Change Data Capture) in Airbyte. Here are some potential causes and solutions to try: 1. Initial Waiting Time: The Postgres Source Connector might be timing out while looking for the WAL (Write-Ahead Log) resume position. Try increasing the "Initial Waiting Time in Seconds (Advanced)" setting in the Connector Settings page. According to a user comment in [this GitHub issue](https://github.com/airbytehq/airbyte/issues/50435), increasing this value to 2400 seconds (40 minutes) resolved a similar issue. 2. WAL Retention: Your database might be purging WAL files before Airbyte can process them, especially with such a high update rate. Try increasing the
wal_keep_size
parameter in your PostgreSQL configuration. As mentioned in the [Postgres Troubleshooting documentation](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs), you can set this to a higher value, e.g., '1GB' or more. 3. Sync Frequency: With 20 million updates per hour, you might need to run syncs more frequently to prevent WAL purging. Try scheduling more frequent syncs to keep up with the high update rate. 4. Check Replication Slot Status: Verify that the replication slot is active and the
confirmed_flush_lsn
is advancing as expected. If it's not, there might be an issue with the replication slot registration. 5. Invalid CDC Position Behavior: In the advanced settings of your source connector, there's an option called "Invalid CDC position behavior". Try setting this to "Fail sync" to get more detailed error messages about what might be going wrong. 6. Debezium Heartbeat: Enable the Debezium heartbeat option in the advanced settings of your source connector. This can help prevent issues with growing WAL backlogs, as mentioned in [this forum post](https://discuss.airbyte.io/t/2675). 7. Version Check: Ensure you're using the latest versions of Airbyte and the Postgres source connector. Some users have reported issues with specific versions, so upgrading might help. If none of these solutions work, you might need to consider alternative strategies for syncing such a large and frequently updated table. This could include partitioning the table, using a different replication method, or consulting with Airbyte support for a more tailored solution to your specific use case.