Is there documentation or recommendations from Airbyte on ho Airbyte #ask-community-for-troubleshooting

Is there documentation or recommendations from Air...

Blake Enyart

11/01/2021, 1:54 PM

Is there documentation or recommendations from Airbyte on how to parallelize a single data source? I have a MS SQL source to Snowflake connection with about 2 billion rows which is unable to complete in under 18 hours which as about the max time I’m able to connect to the MS SQL server to perform the backfill. The whole system is running on AWS EC2 at the moment with a 2 core instance, 16GB memory, and 100GB of EBS storage.

✅ 1

Ionut Oprea

11/01/2021, 2:13 PM

if it’s historical data/a one-time sync, just split it in more jobs by tables. Or dump the tables to an S3 bucket and load them from there instead.

Blake Enyart

11/01/2021, 2:37 PM

So just create multiple connections with the same source and destination and partition which tables sync from each connection? Also, in-terms of the S3 staging, do you know if it is using Snowpipe under the hood and if this improves the overall throughput?

Ionut Oprea

11/01/2021, 2:47 PM

“So just create multiple connections with the same source and destination and partition which tables sync from each connection?” yes, that is what I meant.

👍 1

Ionut Oprea

11/01/2021, 2:50 PM

My other option was to dump the tables from within the MS SQL server to an external, more open, intermediary destination, like an S3 bucket. From that bucket you will be able to use Snowpipe yourself to load it into ❄️

Blake Enyart

11/01/2021, 3:11 PM

Awesome. Thank you for the recommendations here!

3 Views

Open in Slack

Previous Next