Hey everyone, Would like to use S3 as source for ...
# feedback-and-requests
k
Hey everyone, Would like to use S3 as source for a Sync. We have a bunch of s3 files at a prefix - I am assuming the Sync would iterate over them and write the data to destination. How would Airbyte handle failure in sync of 1 (or more) s3 file?
u
We have a bunch of s3 files at a prefix - I am assuming the Sync would iterate over them and write the data to destination.
Yes. You can specify a path pattern: https://docs.airbyte.io/integrations/sources/s3#path-pattern
u
Thanks for this. Was more curious about failure management - like 2 out of 10 file failed (for any reason) - how Airbyte manage this?
u
How would Airbyte handle failure in sync of 1 (or more) s3 file?
It depends on the type of failure. For errors showing up in the schema detection stage, if there are files with inconsistent schema, the S3 source will fail immediately. For errors showing up in the data syncing stage: https://docs.airbyte.io/faq/data-loading#what-happens-to-data-in-the-pipeline-if-the-destinat[…]with-duplicate-data-when-the-pipeline-is-reconnected
u
So essentially Airbyte would store a cursor specifying the s3 files that failed, which it would try to sync in the next run, along with any new files?
u
Yes.
u
Does Airbyte expose information about the s3 files that failed in a run? For example a message on the UI saying 9 out of 10 files synced, 1 failed with error
some error
u
I am not sure about this question. Tag @George Claireaux (Airbyte) here since he is the author of the S3 source connector.