https://linen.dev logo
#ask-community-for-troubleshooting
Title
# ask-community-for-troubleshooting
a

Amit Gelber

10/25/2021, 10:02 AM
Hi everyone! A few questions :) Is it possible to manage a 3TB full sync with Airbyte? MSSQL ->BigQuery? will it handle the load? How can I determine how many workers do I need for my setup? How does it split the load? by table? or each worker grab apart from a table? what is the retry policy? Can we retry failed parts of a table? Was the k8s tested in production?
👀 4
c

Chris (deprecated profile)

10/25/2021, 11:54 AM
i am guessing it should be able to yes
we’ll be making more concrete tests to have definite answers during this quarter. you can follow this issue https://github.com/airbytehq/airbyte/issues/7035
you can read more about workers here https://docs.airbyte.io/understanding-airbyte/jobs
How does it split the load? by table? or each worker grab apart from a table?
But in general, workers are taking care of all streams from the same source as part of a sync Unfortunately the load from a big 3TB source is not automatically split between multiple workers at the moment
what is the retry policy?
Can we retry failed parts of a table?
There will be work around better handling partial checkpointing in the short/medium term too.
Was the k8s tested in production?
yes. airbyte cloud is also managed on k8s
a

Amit Gelber

10/25/2021, 2:21 PM
The 3TB source contains a lot of tables its not a one 3TB file
d

Davin Chia (Airbyte)

10/26/2021, 9:51 AM
One way is to split the various tables into different connections. Each connection is mapped to a worker, so this is the same as ‘manually’ sharding the work
2 Views