Facing an issue - my use case - Incremental sync f...
# ask-community-for-troubleshooting
n
Facing an issue - my use case - Incremental sync from db1 to db2 (cursor based) and use DB2 for some analytics. (so indexign and primary keys on db2 tables is needed) When I incremental append + dedupe sync (cursor based) a table from Db1 to db2 , 1. It doesnt create primary keys on destination - which is ok . I can do manually. 2. I create the primary key then next time when i do incremental - append + dedup. It starts giving duplicate id issues. I am surprised if this is not supported by airbyte as this is a pretty common use case.
k
Your observation is correct and highlights a key aspect of how Airbyte’s Incremental Sync - Append + Deduped mode works: 1. Primary Key Creation: Airbyte does not automatically create primary key constraints on the destination tables, even when you specify a primary key for deduplication during sync. This is by design, as Airbyte manages deduplication at the data level, not by enforcing database constraints. You are expected to manage database-level primary keys or indexes yourself if needed for analytics or performance. 2. Duplicate Key Issues After Manual PK Creation: When you manually add a primary key constraint to the destination table, subsequent syncs can fail with duplicate key errors. This happens because Airbyte’s sync process, even in deduped mode, may insert multiple versions of a record (with the same primary key) into the raw or history tables before deduplication is applied. The deduplication logic is handled by Airbyte’s normalization process, not by the database’s primary key constraint. If you add a primary key constraint at the database level, it will conflict with Airbyte’s insertions, leading to the errors you’re seeing. Why this happens: - Airbyte’s deduplication ensures that the final, normalized table contains only the latest version of each record (per primary key), but the raw and history tables may still contain duplicates. - The deduplication is not enforced by a database primary key constraint, but by Airbyte’s own logic during normalization and sync processing. - Adding a primary key constraint to the destination table can cause sync failures if Airbyte tries to insert a duplicate before deduplication occurs. Documentation references: - “A primary key field is used by Airbyte to select only the most recent version of a given record from the history table for inclusion in the final deduplicated table, and all other historical versions of that record are not included. … [Incremental Sync - Append] does not do any deduplication, it does not require a primary key to be defined.” [Explore Airbyte’s incremental data synchronization](https://airbyte.com/tutorials/incremental-data-synchronization#step-2) - “In the data warehouse, just like in incremental - Append, multiple rows for the same primary key can be found in the history table. The unique records per primary key behavior is mirrored in the final table with incremental deduped sync mode.” [Incremental Sync - Append + Deduped](https://docs.airbyte.com/platform/using-airbyte/core-concepts/sync-modes/incremental-append-deduped) What can you do? - Do not add a primary key constraint to the raw or history tables created by Airbyte. - If you need a table with a primary key constraint for analytics, use the final deduplicated table produced by Airbyte’s normalization process, or create a separate view/table from the deduplicated data. - If you must have a primary key constraint, ensure it is only applied after deduplication, not on tables where Airbyte is still inserting raw or historical data. Summary: This is a known limitation of Airbyte’s current sync and normalization process. The deduplication is handled by Airbyte, not by the database’s primary key constraints. Adding such constraints to Airbyte-managed tables can cause sync failures due to duplicate records being inserted before deduplication. This is a common scenario and is documented in Airbyte’s official resources. If you need further guidance on best practices for analytics on deduplicated data, consider creating downstream tables or views based on Airbyte’s normalized output, rather than modifying Airbyte-managed tables directly.
n
you say it is documented. can you share soem artiicles where it is documented ?
@kapa.ai you say it is documented. can you share soem artiicles where it is documented ?
k
A message from kapa.ai
Based on the provided knowledge sources, Airbyte documentation explains that in Incremental Append + Deduped mode, deduplication is handled by Airbyte’s logic and not by enforcing primary key constraints at the database level. The documentation states: > "A primary key is one or multiple (called composite primary keys) fields or columns that is used to identify the unique entities of a table. Only one row per primary key value is permitted in a database table. In the data warehouse, just like in incremental - Append, multiple rows for the same primary key can be found in the history table. The unique records per primary key behavior is mirrored in the final table with incremental deduped sync mode. The primary key is then used to refer to the entity which values should be updated." [Incremental Sync - Append + Deduped](https://docs.airbyte.com/platform/using-airbyte/core-concepts/sync-modes/incremental-append-deduped) Additionally, the tutorial clarifies: > "If an Airbyte user has selected the Incremental Sync - Deduped History sync mode, then the data must be deduplicated so that a single entry in the source table only results in a single corresponding entry in the final normalized deduplicated destination table, even though multiple versions corresponding to that record may appear in the raw and historic data tables. A primary key field is used by Airbyte to select only the most recent version of a given record from the history table for inclusion in the final deduplicated table, and all other historical versions of that record are not included." [Explore Airbyte’s incremental data synchronization](https://airbyte.com/tutorials/incremental-data-synchronization#step-2) However, the documentation does not explicitly state that adding a primary key constraint to the destination table will cause sync failures due to duplicates. It does make clear that deduplication is managed by Airbyte and that multiple versions of a record may exist in the raw/history tables, but it does not directly warn against adding primary key constraints yourself. In summary, while the documentation describes how deduplication works and the role of primary keys in Airbyte’s process, it does not explicitly document the issue of sync failures when adding primary key constraints to destination tables. If you need more explicit guidance, you may need to consult Airbyte community forums or GitHub issues for real-world experiences.