Can you explain how the GCS source connector works? How does it fetch only the modified data?
k
kapa.ai
02/07/2025, 8:04 AM
Based on the knowledge sources provided, I can explain how the Google Cloud Storage (GCS) source connector works and how it fetches modified data. The GCS source connector supports both full refresh and incremental sync modes [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#supported-sync-modes). For incremental syncs, the connector uses a "Start Date" parameter to determine which files to replicate. This parameter marks a starting date and time in UTC for data replication. Any files that have not been modified since this specified date/time will not be replicated [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#for-airbyte-open-source). The connector uses file URLs to track which files have been synced. For incremental syncs, it compares the modification dates of the files with the last sync date to determine which files need to be synced [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#file-urls). It's important to note that if you change the authorization type (e.g., from Service Account Information to Google OAuth), and you're using Incremental sync, the next sync will not use the old state and will reread all provided files in Full Refresh mode. Subsequent syncs will then be Incremental as expected [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#file-urls). The connector allows you to specify globs (regular expressions) to pattern match specific files to replicate. This gives you control over which files are included in the sync [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#for-airbyte-open-source). Keep in mind that using cloud storage may incur egress costs, which refer to data transferred out of the cloud storage system [Google Cloud Storage (GCS)](https://docs.airbyte.com/integrations/sources/gcs#google-cloud-storage-gcs).