Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases.

Airbyte

Hi!

First of all thanks for creating this great tool: loving the documentation and the learning curve seems perfect for me.
I’m reworking a bit our analytics setup a bit, trying to make operational data (from a django app) mixed with lots of tracking data (we’re using adjust for it) available in an analytics rds aurora (postgres) db.

The trouble is, that when trying to import all those xx,xxx CSVs at the same time, even when I provide the schema, overall process takes a long time. When schema is not provided, it seems like S3 source connector is trying to iterate over the whole bucket to derive and validate it.  Based on this observations I have a few questions:

• Would you recommend using airbyte for importing lots and lots of CSVs files (event logs) into postgres?
• Would you maybe suggest switching to a proper data warehouse (e.g. snowflake) instead and do import there?
• Are there any settings in the S3 source connector that would help me speed up the process?
• Would having more workers help (as I’m just experimenting, my setup is based on docker-compose up, but that can be changed)?
• Are there any plans for S3 source connector on the roadmap which would address this or similar issue?
Cheers ^_^