<@U04S8PLJWMT> Is there a way to collect only the ...
# ask-ai
i
@kapa.ai Is there a way to collect only the delta (the diff) after the first collection completed?
k
Yes, Airbyte supports collecting only the delta (the diff) after the first collection is completed. This is achieved through incremental synchronization. Incremental synchronization is a replication method that efficiently keeps a source and a destination in sync by only replicating records that have been inserted or updated in the source system since the previous sync. This is more efficient than copying an entire dataset on each iteration, which is the behavior of full refresh synchronization. Here are the key points about incremental synchronization in Airbyte: 1. It uses a cursor to keep track of which records have been sent in previous iterations. The cursor acts as a pointer to the most recent record that has been replicated. 2. During subsequent syncs, Airbyte includes the cursor value as part of the query on the source system, ensuring that only new or updated records are selected and sent to the destination. 3. Airbyte supports two main incremental sync modes: - [Incremental Sync - Append](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append) - [Incremental Sync - Append + Deduped](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/incremental-append-deduped) 4. To use incremental syncs, the API endpoint needs to fulfill certain requirements, such as having a cursor field (e.g., an
updated_at
timestamp) and the ability to filter/request records by this cursor field. It's important to note that the first sync will still be a full refresh, but subsequent syncs will only fetch the delta. As stated in the knowledge source: "If the sync is running for the first time, it is equivalent to a [Full Refresh](https://docs.airbyte.com/using-airbyte/core-concepts/sync-modes/full-refresh-append) since all data will be considered as new." For more detailed information on how incremental synchronization works in Airbyte, you can refer to the [Incremental data synchronization tutorial](https://airbyte.com/tutorials/incremental-data-synchronization#step-2).
i
@kapa.ai I don't have an increment option. Is it dependent on the api?
k
A message from kapa.ai
Yes, the availability of the incremental sync option is indeed dependent on the API and the specific source connector you're using. Here are a few key points to consider: 1. API Capabilities: The source API must support filtering or querying data based on a timestamp or other cursor field. As mentioned in the [Incremental Sync documentation](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync#incremental-sync-without-time-filtering), "Some APIs do not allow filtering records by a date field, but instead only provide a paginated 'feed' of data that is ordered from newest to oldest." 2. Connector Implementation: The Airbyte connector for your specific source must be implemented to support incremental sync. Not all connectors support this feature. 3. Table/Stream Structure: For database sources, the table must have a suitable column to use as a cursor field, typically a timestamp or auto-incrementing ID. 4. Connector Version: In some cases, the incremental sync option might be available in newer versions of a connector but not in older ones. For example, there was an issue with the [Redshift source connector](https://github.com/airbytehq/airbyte/issues/16742) where incremental updates were not available in version 0.3.14, but were available in other versions. 5. Configuration: Some connectors require specific configuration to enable incremental sync. For instance, the [MySQL source with CDC](https://github.com/airbytehq/airbyte/discussions/41966) requires proper setup of replication slots, publications, and replica identity. If you're not seeing the incremental option for a specific source, you could try the following: 1. Check the documentation for your specific source connector to see if incremental sync is supported. 2. Ensure you're using the latest version of the connector. 3. Verify that your source data has a suitable cursor field for incremental sync. 4. If it's a database source, make sure the table has a primary key or a suitable cursor column. If you provide more details about the specific source you're trying to use, I might be able to give you more targeted advice.