Hello Airbyte Community folks, I’ve run into a pr...
# ask-community-for-troubleshooting
g
Hello Airbyte Community folks, I’ve run into a problem where it looks as though an incremental: deduped + history sync may not be deduping. The connection is from Postgres to Snowflake and is syncing one table. For the source table, the cursor field is a timestamp with tz,
updated_at
, and the primary key,
id
is a unique bigint generated from a sequence. There are ~286M records in the table. After the destination tables are generated, the
_scd
table has ~759M rows and the final table has ~450M rows. If I query the final table for `id`s having a count > 1, I get ~1.8M results. A similar query on the source table returns 0 results, as you might expect. There don’t seem to be any big errors in the initial table population or subsequent hourly syncs. Any suggestions would be happily received. Thanks!
✍️ 1
u
@[DEPRECATED] Marcos Marx turned this message into Zendesk ticket 2628 to ensure timely resolution!
u
Hi! Sorry to hear the deduped + history isn't working correctly. Let me look into this and I hope to have some info for you tomorrow!
u
Hi! Sorry to hear the deduped + history isn't working correctly. Let me look into this and I hope to have some info for you tomorrow!
g
Thank you! Greatly appreciated
A little more information on this: the original connection was set up to do a full table overwrite and was switched to incremental dedupe+history. Is it possible the reset that ran when the strategy changed didn’t clear data in some of the temp tables?