Hi <@U04SE163WC8> how to stream new data only, and...
# ask-ai
e
Hi @kapa.ai how to stream new data only, and avoid historical data
k
To stream only new data and avoid historical data, you can configure your stream for incremental syncs. This can be done by specifying a
DatetimeBasedCursor
on your `Streams`'s
incremental_sync
field. Given a start time, an end time, and a step function, it will partition the interval [start, end] into small windows of the size described by the step. This way, only data that has changed since the prior sync is replicated to a destination. Here is an example of how to update the catalog to run in incremental mode:
Copy code
json {     "streams": [       {         "stream": {           "name": "rates",           "json_schema": {},           "supported_sync_modes": ["full_refresh", "incremental"]         },         "sync_mode": "incremental",         "destination_sync_mode": "overwrite"       }     ]   }
In addition to records, the
read
operation now also outputs state messages:
Copy code
json {"type": "STATE", "state": {"data": {"rates": {"date": "2022-07-15"}}}}
Where the date ("2022-07-15") should be replaced by today's date. You can find more information on
incremental_sync
configurations and the
DatetimeBasedCursor
component in the [incremental syncs](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/incremental-syncs) section of the Airbyte documentation. Please note that not all streams support incremental syncs. For example, in the GitHub source, only 4 streams (
comments
,
commits
,
issues
and
review comments
) are pure incremental, meaning that they read and output only new records. Other streams may read all records but output only new records, or they may not support incremental syncs at all. Sources: - [Configuring a stream for incremental syncs](https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/yaml-overview#configuring-a-stream-for-incremental-syncs) - [Supporting incremental syncs](https://docs.airbyte.com/connector-development/config-based/tutorial/incremental-reads#supporting-incremental-syncs) - [GitHub source notes](https://docs.airbyte.com/integrations/sources/github#notes)
e
@kapa.ai I deploy the airbyte on GCP compute engine. when creating a new connection, how to pull the new data only, and skip the historical data
k
A message from kapa.ai
To pull only new data and skip historical data in Airbyte, you need to set up your connection to use the Incremental Sync mode. Here are the steps to do this: 1. When creating a new connection in Airbyte, you will be asked to select a sync mode for each stream. Choose "Incremental" for the streams you want to sync incrementally. 2. If you're upgrading an existing connection to use incremental sync, you'll need to dual-write for connections incrementally syncing data without re-syncing historical data you've already replicated. This involves copying the raw data you've already replicated to the new schema being used by your newly created connection. You need to do this for every stream in the connection with an incremental sync mode. 3. Navigate to the existing connection you are duplicating, and navigate to the
Settings
tab. Open the
Advanced
settings to see the connection state (which manages incremental syncs). Copy the state to your clipboard. 4. Go to your newly created connection, replace the state with the copied contents in the previous step, then click
Update State
. This will ensure historical data is not replicated again. 5. Enabling the connection will now provide a parallel copy of all streams in the updated format. 6. You can move your dashboards to rely on the new tables, then pause the out-of-date connection. Please note that not all sources in Airbyte support incremental sync. You can check the documentation for each source to see if it supports incremental sync. Sources: - [Upgrading to Destinations V2](https://docs.airbyte.com/release_notes/upgrading_to_destinations_v2#additional-steps-for-incremental-sync-modes) - [Incremental Sync](https://docs.airbyte.com/connector-development/connector-builder-ui/incremental-sync)
e
@kapa.ai is it working on mysql data source?
k
A message from kapa.ai
Yes, Airbyte's MySQL connector does support incremental sync. This allows you to replicate only new and updated data from your MySQL source, skipping historical data. However, there are some cases where incremental sync might not be the best choice. For example, if your MySQL server does not expose the binlog, or if your data set is small and you just want a snapshot of your table in the destination, Airbyte recommends against using this replication method. Please note that the minimum required platform version is v0.58.0 to run source-mysql 3.4.0. You can find more information about the MySQL connector and its features in the [Airbyte documentation](https://docs.airbyte.com/integrations/sources/mysql#mysql). Sources: - [MySQL Source Documentation](https://docs.airbyte.com/integrations/sources/mysql#mysql) - [MySQL Replication Modes](https://docs.airbyte.com/integrations/sources/mysql#standard)