horizontal scaling question I understand there exists a sett Airbyte #ask-community-for-troubleshooting

horizontal scaling question... I understand there ...

Hrvoje Piasevoli

05/01/2022, 9:30 PM

horizontal scaling question... I understand there exists a setting to control max workers (syncs). But how about parallel / scale out execution for a single connection with huge amount of tables? Eg a k8s deployment with a single connection should ideally be split across many available nods in the pool. Current situation is that you should split the load manually across multiple connections, being careful to have distinct streams. Not nice, and should be a setting to lessen the maintainance... Ideally there should be a que of streams and auto assigned to available resources

✅ 1

Hrvoje Piasevoli

05/01/2022, 9:35 PM

Probably a setting that is connection based. Connections (sources and destinations) shouldnt care about it. Strictly orchestration/scheduler implementaion reaponsibility. Eg split streams in ques, assign to worker

Hrvoje Piasevoli

05/01/2022, 9:39 PM

The scheduling optimizer could be further optimized by either stats or changing the source spec to include estimated row counts or aimilar useful metrics

Hrvoje Piasevoli

05/01/2022, 9:44 PM

Normalization should be optionally postponed if it affects workers (not sure how it is implemented) for successful EL streams. The idea is to be able to scale in the k8s pool as DBT is destination workload

Hrvoje Piasevoli

05/01/2022, 9:50 PM

One important consideration and reasoning to the original suggestion - i should only ever split the source load to multiple connections if i want to have different sync schedules, never because of parallelism

Augustin Lafanechere (Airbyte)

05/02/2022, 3:49 PM

Hi @Hrvoje Piasevoli, thank you for this feedback. As you observed, replication of streams currently happens sequentially. You can only tweak the parallelization of jobs, not of streams inside a single job as you mentioned. This is something we plan to work on, you can follow this epic issue on GitHub. Feel free to share you suggestions there too 👍🏻

👋 1

Hrvoje Piasevoli

05/02/2022, 3:57 PM

Thanks very much for this @Augustin Lafanechere (Airbyte). Exactly what I needed and hoped already existed but couldn't find. I'll add my comments there

3 Views

Open in Slack

Previous Next