<@U04SE163WC8> how does incremental+dedupe replica...
# ask-ai
r
@kapa.ai how does incremental+dedupe replication mode's deduping work? I'm using cdc on mssql, and would like to understand how the deduping is done when there are multiple updates for the same record.
k
In Airbyte, the incremental dedupe replication mode ensures that the final data table contains only a single copy of each record corresponding to each unique ID in the source system. This is particularly useful when there are multiple updates for the same record. When using Change Data Capture (CDC) on MSSQL, the deduping process works as follows: 1. Raw and History Tables: When a record is modified in the source system and a sync is performed, the updated record is copied to a raw table in the destination database and appended to the history table. This means that both the raw and history (Slowly Changing Dimension or SCD) destination tables will contain multiple versions of the same record. 2. Final Data Table: The final data table is deduped, meaning it only contains a single copy of each record. This is achieved by using SQL logic that ensures only the most recent version of each record is included in the final table. 3. Deduplication Logic: The deduplication is typically achieved by copying records from the history table into the final table only when a specific condition is met. For example, in the provided extract, the deduplication is done by copying records from
table_two_scd
into
table_two
only when the
_airbyte_active_row
has a value of 1. This ensures that only the active (most recent) version of each record is included in the final table. Here is an example of how this might look in SQL:
Copy code
sql INSERT INTO table_two SELECT * FROM table_two_scd WHERE _airbyte_active_row = 1;
This SQL statement ensures that only the records marked as active (with
_airbyte_active_row = 1
) are copied into the final deduped table. For more detailed information, you can refer to the following sources: - [Incremental Data Synchronization Tutorial](https://airbyte.com/tutorials/incremental-data-synchronization#step-7) - [Change Data Capture (CDC) Synchronization Tutorial](https://airbyte.com/tutorials/incremental-change-data-capture-cdc-replication) If you have any further questions or need additional clarification, feel free to ask!
r
@kapa.ai how can i query the history table (table_scd) where i can find the column __airbyte_active_row_
k
A message from kapa.ai
To query the history table (
table_scd
) and find the column
_airbyte_active_row
, you can execute a simple SQL
SELECT
statement. This column is used to indicate whether a record is the most recent (active) version of that record. Here is an example of how you can query the history table to retrieve all columns, including `_airbyte_active_row`:
Copy code
sql SELECT * FROM table_two_scd;
This query will return all records from the
table_two_scd
table, including the
_airbyte_active_row
column. The
_airbyte_active_row
column will have a value of
1
for the most recent (active) version of each record and
0
for older versions. For example, the result might look like this: ``` _airbyte_unique_key | _airbyte_unique_key_scd | id | name | updated_at | _airbyte_start_at | _airbyte_end_at | _airbyte_active_row | _airbyte_ab_id | _airbyte_emitted_at | _airbyte_normalized_at | _airbyte_table_two_hashid ----------------------------------+----------------------------------+----+------------+----------------------------+----------------------------+----------------------------+---------------------+--------------------------------------+----------------------------+-------------------------------+---------------------------------- c4ca4238a0b923820dcc509a6f75849b | 96a17eccedd409b3b3de3b411d431ab8 | 1 | Eg1 DD+Hst | 2022-09-01 161807.569818 | 2022-09-01 161807.569818 | | 1 | 3bd474b8-0329-4bce-bde7-aee7c5d30cc8 | 2022-09-01 165244.103+00 | 2022-09-01 170337.521659+00 | 78812b56ae516cb4bb4278c595c3b20e c81e728d9d4c2f636f067f89cc14862c | 064c6bc6ac9c9956a7188495284d8c07 | 2 | Eg2a DD+Hs | 2022-09-01 163013.93903 | 2022-09-01 163013.93903 | 2022-09-01 170214.841419 | 0 | 4282344a-62c3-4634-a91a-e6dafb9b253a | 2022-09-01 165244.103+00 | 2022-09-01 170337.521659+00 | d89c24a3d20ae9663d6f9a40f023149c c81e728d9d4c2f636f067f89cc14862c | 8d269939a7b0ae8c321d5f25d3be8619 | 2 | Eg2b DD+Hs | 2022-09-01 170214.841419 | 2022-09-01 170214.841419 | | 1 | 89377204-7801-49c8-a779-91da45a86cc3 | 2022-