Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases.

Airbyte

Hi there, a quick question regarding the S3 destination:
I am using an EC2 instance with 16 or 32 Gigs of RAM to pull in a data source which, in total, provides several 100 Gigs uncompressed.

I see in the logs what the total amount of data read is and when a buffer is flushed. I am using the default S3 destination settings for Parquet SNAPPY compression.

Yet, every single run, the sync fails with OoM errors.
Why would that happen if the buffer flushes every 200MB or so (default setting is 128MB even)?

Am I missing something obvious why memory is drowning besides flushes?

I would expect the sync just to take veeeery long, but not to break.

<@U01MMSDJGC9> turned this thread into Zendesk ticket <https://airbyte7538.zendesk.com/agent/tickets/2647 | 2647> to ensure timely resolution!

Furthermore, before the OoM happens, the memory of the machine spikes in a short period of time.
Say 2:45 out of 3 hours of total runtime, the machine is at 40% Mem, spikes to 80-90% within 5-10 minutes and then is bricked of course.

Happy to provide further details if helpful