Saman Arefi
03/15/2022, 1:58 PMt2.large
instance and describe, in details, how Airbyte is mainly memory and disk bound.
I've been testing stuff out now on an t3.xlarge
and noticed the following:
Loading one large-ish Oracle table (~9GB, 7M rows) takes me about 30min, which I think is pretty good. Now, loading two at the same time via the same connector (9GB, 7M rows, 13 GB, 7M rows) takes an hour in total, with both taking up roughly an hour each.
What gives?
Looking at htop, I seem to be running more into a CPU limit as well, so I'm not sure what's causing this. These are my two largest table, but in production I'd use Airbyte for another 30 or so tables, each between 10k and 1M rows as well, so this doesn't seem to scale well. Or am I doing something wrong?Augustin Lafanechere (Airbyte)
03/15/2022, 3:06 PMMAX_SYNC_WORKERS
env var to increase sync parallelism.Saman Arefi
03/15/2022, 3:15 PMAugustin Lafanechere (Airbyte)
03/18/2022, 3:28 PMMAX_SYNC_WORKERS
value to >5 might help. I'd suggest also you try to upsize your instance to a t3.2xlarge to check if you get some performance boost.Saman Arefi
03/18/2022, 4:24 PMAugustin Lafanechere (Airbyte)
03/18/2022, 5:20 PMJOB_MAIN_CONTAINER_MEMORY_REQUEST
env var.