When using Full Refresh mode in a connection from ...
# ask-ai
h
When using Full Refresh mode in a connection from postgres to Redshift. Does it still only fetch the new values from CDC log? or all records in the database
u
When using Full Refresh mode in a connection from Postgres to Redshift, Airbyte will fetch all records in the database, not just new values from the CDC log. Full Refresh mode is designed to retrieve all available data requested from the source, regardless of whether it has been synced before. This is in contrast to incremental sync modes that only replicate new or updated data since the last sync. According to the Airbyte documentation: "The Full Refresh modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available information requested from the source, regardless of whether it has been synced before." [Full Refresh - Overwrite](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/full-refresh-overwrite) In the case of Full Refresh - Overwrite mode: "New syncs will destroy all data in the existing destination table and then pull the new data in. Therefore, data that has been removed from the source after an old sync will be deleted in the destination table." [Full Refresh - Overwrite](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/full-refresh-overwrite) It's important to note that CDC (Change Data Capture) is typically used with incremental sync modes to capture only changes since the last sync. If you want to take advantage of CDC, you should use an incremental sync mode instead of Full Refresh. As stated in the Airbyte tutorial: "If you want to take full advantage of using Change Data Capture, you should use Incremental | Append mode to only look at the rows that have changed in the source and sync them to the destination. Selecting a Full Refresh mode would sync the whole source table, which is most likely not what you want when using CDC." [Build an EL(T) from Postgres CDC (Change Data Capture)](https://airbyte.com/tutorials/postgresql-change-data-capture-elt#step-5)
h
@kapa.ai why cant i select incremental sync mode on a stream?
u
There could be a few reasons why you can't select incremental sync mode for a particular stream in Airbyte: 1. The stream may not support incremental sync. Some streams only support full refresh sync modes due to limitations in the source API or connector implementation [Source](https://docs.airbyte.com/snowflake-native-apps/linkedin-ads#supported-streams). 2. The stream may lack a primary key or cursor field necessary for incremental syncing. For example, in MySQL CDC connections, streams without a primary key cannot use incremental sync [Source](https://github.com/airbytehq/airbyte/issues/38659). 3. There might be a bug or issue with the specific connector you're using. For instance, there was a reported issue with the Stripe connector where the invoice_line_items stream didn't support incremental sync even though its parent stream did [Source](https://github.com/airbytehq/airbyte/issues/30073). 4. The destination you're using might not support incremental sync for that particular stream or in general [Source](https://github.com/airbytehq/airbyte/issues/32083). 5. In some cases, the UI might not be displaying the incremental sync option even if it's supported. This could be due to a bug or version-specific issue [Source](https://discuss.airbyte.io/t/2378). To troubleshoot, you could try: - Checking the documentation for your specific source connector to see if incremental sync is supported for that stream. - Ensuring your Airbyte version is up-to-date. - Verifying that the stream has a suitable cursor field or primary key for incremental syncing. If the issue persists, it might be worth reporting it on the Airbyte GitHub repository or community forums for further assistance.
h
@kapa.ai so if i dont have a primary key on my source database. Redshift cant use CDC changes ?
u
A message from kapa.ai
u
Yes, that's correct. For Change Data Capture (CDC) to work, the source tables must have a primary key. This is because CDC relies on these keys to track changes in the data. If a table doesn't have a primary key, it can't support CDC incremental sync. Instead, you would have to use Full Refresh sync for those tables [Source](https://docs.airbyte.com/understanding-airbyte/cdc#limitations). However, it's important to note that Redshift does not support CDC at all, regardless of whether the tables have primary keys or not. Redshift does support incremental sync, but it's cursor-based, using
ORDER BY
on a user-defined cursor column, not CDC-based [Source](https://docs.airbyte.com/integrations/sources/redshift#features). So, if you're using Redshift as your source, you won't be able to use CDC. You'll need to use the cursor-based incremental sync and define a cursor field such as an
updated_at
column [Source](https://docs.airbyte.com/integrations/sources/redshift#features).
h
@kapa.ai “streams without a primary key cannot use incremental sync” does this apply to Postgres too?
u
A message from kapa.ai
u
Yes, this applies to Postgres as well. For incremental sync to work in Postgres, the tables must have a primary key. This is because incremental sync relies on these keys to track changes in the data. If a table doesn't have a primary key, it can't support incremental sync. Instead, you would have to use Full Refresh sync for those tables [Source](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#cdc-requirements). However, it's important to note that there are two types of incremental sync in Airbyte: Append and Deduped. The Deduped mode requires a primary key to deduplicate the data, but the Append mode does not require a primary key [Source](https://airbyte.com/tutorials/incremental-data-synchronization#step-2). So, if your Postgres table doesn't have a primary key, you can still use the Append mode for incremental sync. But if you want to use the Deduped mode or CDC, you'll need a primary key.
h
@kapa.ai i cant select Incremental Append mode, why?
u
A message from kapa.ai
u
There could be several reasons why you can't select the Incremental Append mode for a particular stream: 1. No Suitable Cursor Field: The stream may not have a suitable cursor field. A cursor field is a column in your table that is monotonically increasing or has a timestamp. It's used to identify new and updated records for incremental sync. If no such field exists, Incremental Append mode may not be available [Source](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append#user-defined-cursor). 2. Connector Limitations: The specific connector you're using may not support Incremental Append mode. This could be due to limitations in the source API or the connector's implementation. 3. Nullable Columns: There was an issue in certain versions of Airbyte where nullable columns couldn't be used as a cursor, which prevented Incremental Append mode from being available. This issue has been resolved in later versions [Source](https://discuss.airbyte.io/t/2694). 4. Schema Changes: If the schema of the source has changed, it might affect the availability of Incremental Append mode. Airbyte currently has limited support for handling schema changes [Source](https://discuss.airbyte.io/t/2835). To troubleshoot, you could try: - Checking the documentation for your specific source connector to see if Incremental Append mode is supported. - Ensuring your Airbyte version is up-to-date. - Verifying that the stream has a suitable cursor field for Incremental Append syncing. If the issue persists, it might be worth reporting it on the Airbyte GitHub repository or community forums for further assistance.