https://linen.dev logo
#contributing-to-airbyte
Title
# contributing-to-airbyte
p

Pras

02/24/2022, 6:45 AM
Hello, Just to be clear on scaling airbyte in Kubernetes/GKE. Is this a fair understanding? • Bump MAX_X_WORKERS to higher number to increase per worker load, X being SPEC, CHECK, SYNC and DISCOVER. • Keep SUBMITTER_NUMBER_THREADS to a equal to or higher number as sum of all MAX_X_WORKERS times worker replicas value. For example:- if each of those is configured to 20, and there are three workers running keep this value 240+? • Expose container ports/temporal ports equal to a minimum of at-least sum of all MAX_X_WORKERS. For example:- if each of those is configured to 20 each expose 80 ports 9001 to 9080? The default just lists 30/40. What is the difference/relation between containerPort lines in worker.yaml and TEMPORAL_WORKER_PORTS env variable. One lists 30 and the other 40 - so does not seem to match. • Assuming I am going to fix/reach vertical limit per worker pod and after reaching full util out of running workers, just keep bumping replica count to higher values (as we add more connections)
Bump MAX_X_WORKERS to higher number to increase per worker load, X being SPEC, CHECK, SYNC and DISCOVER.
yes
Keep SUBMITTER_NUMBER_THREADS to a equal to or higher number as sum of all MAX_X_WORKERS times worker replicas value. For example:- if each of those is configured to 20, and there are three workers running keep this value 240+?
that’s a good general thumb. more accurately you need to keep it higher than the number of sync workers your deployment supports
Expose container ports/temporal ports equal to a minimum of at-least sum of all MAX_X_WORKERS. For example:- if each of those is configured to 20 each expose 80 ports 9001 to 9080? The default just lists 30/40. What is the difference/relation between containerPort lines in worker.yaml and TEMPORAL_WORKER_PORTS env variable. One lists 30 and the other 40 - so does not seem to match.
the
containerPort
should only be a kube configuration and shouldn’t affect this. the
temporalWorkerPorts
is the right env var to modify. this represents the ports open per worker. each job uses 4 ports, so you’ll need 4 times the total number of jobs you want to run simulatneously
Assuming I am going to fix/reach vertical limit per worker pod and after reaching full util out of running workers, just keep bumping replica count to higher values (as we add more connections)
yes, although it’s related to the number of simultaneous jobs you wish to run rather than the adding more connections. connections are just database state. all the scaling we are discussing it related to supporting more concurrent jobs
Ok thanks for the details @Davin Chia (Airbyte) appreciate it. Yeah most of our connections are in 15min frequency so concurrent jobs will increase as we add more connections albeit a little slower. I am still not clear on that containerPort part, assuming it's kube, if its not reflecting the values mentioned in that env, what is it controlling/configuring and can I drop the whole list completely as well from the yaml?
you should be able to drop the list. I haven’t tried it myself so I’d be curious what you find out
7 Views