https://linen.dev logo
a

Antonio Grass

02/14/2022, 2:25 PM
Is this your first time deploying Airbyte: No  OS Version / Instance: ubuntu 20.04  Memory / Disk: 32Gb / 120GB  Deployment: Docker compose Version : 0.35.15-alpha source : MSSQL 2012 destination: MSSQL 2019 Need help with configuring to make moving tables from one SQL to another faster. Moving a SQL table has 15M rows, and takes up about 2GB of space. Tested using SSIS and the runtime was 1.5 minutes. Tested using Airbyte after and it took roughly 1.5 hours. I looked at the scaling airbyte page and saw it mostly had to do with memory and storage. I updated to 64Gb of memory, and moved the machine to having 16 cores. After another few tests I kept seeing pretty low CPU and memory usage so I then updated the
JOB_MAIN_CONTAINER_MEMORY_REQUEST
and
JOB_MAIN_CONTAINER_MEMORY_LIMIT
say that I had double the memory in my system in hopes that it would use more, but I never saw it use more than 10gb. The source and destination DB servers have 32gb memory each and plenty of storage. I also updated the number of workers from 5-15 and set new CPU request limits at near the maximum. I also restarted the server after making these changes. from watching the log file as the job run it seems like the bottleneck is when it flushes the buffer. It reports that it only takes seconds but in reality it appears to take 2-3 minutes between this happening and running through the next group of record reads. I have attached a log from my recent run that I stopped at 6 minutes as all the adjustments I made did not seem to yield and changes. Really my issue is how can I configure airbyte to better move this type of table between sql servers, when SSIS completed this in a minute and a half I didnt expect it to match but taking over an hour was far from the expected result.
k

Krzysztof Karski

02/17/2022, 2:15 AM
Hello @Matt Klinck probably something that can help you is: https://github.com/airbytehq/airbyte/issues/4314 Today the abtch size is 1000 records, increasing (using more memory) you can reduce the time to complete the Job. Today this is not configurable