<@U04SE163WC8> I using ms sql connector for the so...
# ask-ai
h
@kapa.ai I using ms sql connector for the source and snowflake as my target. How could I differentiate between initial load and cdc change?
u
When using Airbyte's Change Data Capture (CDC) feature with Microsoft SQL Server (MSSQL) as the source and Snowflake as the destination, the initial load and CDC changes are handled differently. The initial load is a snapshot of your entire database at a specific point in time. This is done to ensure that the destination (Snowflake in your case) has a complete copy of the data from the source (MSSQL). This initial load is performed when you run the sync for the first time using CDC. Airbyte uses snapshot isolation to avoid acquiring table locks, allowing simultaneous writes by other database clients. This must be enabled on the database like so:
Copy code
sql ALTER DATABASE {database name}     SET ALLOW_SNAPSHOT_ISOLATION ON;
(Source: [Airbyte Documentation](https://docs.airbyte.io/integrations/sources/mssql#2-enable-snapshot-isolation)) After the initial load, CDC incremental replication comes into play. CDC incremental replication reads a log of the changes that have been made to the source database and transmits these changes to the destination. Because changes are read from a transaction log when using CDC, it is not necessary for the source data to have a suitable cursor field. This means that any new changes in your MSSQL database will be captured and replicated to Snowflake. However, please note that there are certain limitations with CDC. For example, CDC incremental is only supported for tables with primary keys for most sources. Also, the modifications you are trying to capture must be made using `DELETE`/`INSERT`/`UPDATE`. Changes made from `TRUNCATE`/`ALTER` won't appear in logs and therefore in your destination. (Source: [Airbyte Documentation](https://docs.airbyte.com/understanding-airbyte/cdc#limitations))
h
@kapa.ai How could I identify if the record has been loaded by initial load or cdc in snowflake?
u
A message from kapa.ai
u
In Airbyte's Change Data Capture (CDC) synchronization, there isn't a direct way to differentiate between records loaded by the initial load or CDC in the destination (Snowflake in your case). However, you can infer this information based on the timestamps in the system columns added by Airbyte during the replication process. Here are some of the system columns that Airbyte adds during the replication process: - _airbyte_emitted_at: This column tells you when the record was emitted from the source. This can be used to understand when the record was read from the source database. - _airbyte_normalized_at: This column tells you when the record was normalized (i.e., when was the record created from the raw data). - _ab_cdc_deleted_at: This column indicates the time of deletion for a record. If a record is deleted in the source, this column will contain the timestamp of deletion. - _ab_cdc_lsn: This column contains the Log Sequence Number (LSN) for the record. This is used in CDC to keep track of the changes in the source database. - _airbyte_active_row: This column is used for specifying which rows should be included in the deduplicated table. - _airbyte_cdc_tut_hashid: This is an md5 hash of the fields that appear in the _airbyte_data JSON. This is calculated and added during normalization. You can use these system columns to infer whether a record was loaded during the initial load or during CDC. For example, if a record has an _ab_cdc_lsn value, it means that the record was captured during CDC. If a record does not have an _ab_cdc_lsn value but has an _airbyte_emitted_at value, it could mean that the record was loaded during the initial load. Please note that this is an inference and might not always be accurate. Airbyte does not provide a direct way to differentiate between records loaded by the initial load or CDC. (Source: [Airbyte Tutorial](https://airbyte.com/tutorials/incremental-change-data-capture-cdc-replication))