What’s the use case for running with multiple controllers? These are stateless, and don’t have a lot of load (if using something like HDFS for deep storage), right? So is it just zero downtime (assuming you have a LB in front of them) in case one goes down?
m
Mayank
05/21/2021, 7:01 PM
Fault tolerance for one.
Mayank
05/21/2021, 7:02 PM
Also, when you get into thousands of tables range, then single controller might not cut it.
k
Ken Krugler
05/21/2021, 7:04 PM
Hmm, I thought the controller load with lots of tables was due to synchronization load, but having two controllers doesn’t distribute that load, right? So what’s the bottleneck with one controller and 1000s of tables?
m
Mayank
05/21/2021, 7:04 PM
What do you mean by synchronization load?
k
Ken Krugler
05/21/2021, 7:05 PM
When viewing logs, I see a lot of Helix-related activity (what’s the ideal vs. actual state). But maybe that’s just when we’re redeploying…
m
Mayank
05/21/2021, 7:06 PM
Controller does run a lot of background jobs (eg retention)
Mayank
05/21/2021, 7:07 PM
But you are right, with deepstore + metadata push, a lot of the network traffic is avoided. Also some local IO + CPU is also avoided since the controller won't need to untar/unzip the segments etc.
k
Kishore G
05/21/2021, 7:29 PM
if using metadata based push then it’s really for fault tolerance.. even with thousands of tables one controller is probably enough