Hi folks. How would I speed up my sync jobs. I'm...
# ask-community-for-troubleshooting
a
Hi folks. How would I speed up my sync jobs. I'm using latest version of airbyte (in k8s cluster). I tried to sync PG to snowflake and got this result
Copy code
49.25 GB | 32,773,497 emitted records | 32,773,497 committed records | 2h 32m 18s | Sync
So the speed is around 5MB/s. How would I make it faster? I increased
Copy code
SUBMITTER_NUM_THREADS=40
MAX_SYNC_WORKERS=20
as was described here https://discuss.airbyte.io/t/scaling-airbyte-on-k8s-increased-job-parallelism/826 But I wasn't sure how to increase number of workers. Also what else can I tune to make jobs go faster?
Do I parallelize by specifying higher number of replicas for
airbyte-worker
?
m
What version of connector are you using? Latest version use dynamic rows fetching and should use the maximum possible rows. Increasing the number of workers won't impact in row ingestion
a
@[DEPRECATED] Marcos Marx I use
0.38.4-alpha
. So this one should have dynamic rows fetching?
d
@Anton Podviaznikov those env vars are to increase the number of concurrent running jobs, and not sync speed. Marcos is talking about the postgres connector version, which is separate from the Airbyte version. Can you give the postgres source 0.4.16 a shot?
l
Postgres source
0.4.12
may be helpful for performance, but it is unlikely that it will have more than 2x impact.
32,773,497 committed records
in
2h 32m 18s
means 4.5K row per second. Based on our internal benchmark for Postgres, this velocity falls under the normal range. So right now we don’t have any magic wander to make it much faster than that. We are working on improving the performance of our Postgres connector. But this is not a trivial task. So it may take a while. This issue is tracked here: https://github.com/airbytehq/airbyte/issues/12532