What’s the use case for running with multiple cont...
# general
k
What’s the use case for running with multiple controllers? These are stateless, and don’t have a lot of load (if using something like HDFS for deep storage), right? So is it just zero downtime (assuming you have a LB in front of them) in case one goes down?
m
Fault tolerance for one.
Also, when you get into thousands of tables range, then single controller might not cut it.
k
Hmm, I thought the controller load with lots of tables was due to synchronization load, but having two controllers doesn’t distribute that load, right? So what’s the bottleneck with one controller and 1000s of tables?
m
What do you mean by synchronization load?
k
When viewing logs, I see a lot of Helix-related activity (what’s the ideal vs. actual state). But maybe that’s just when we’re redeploying…
m
Controller does run a lot of background jobs (eg retention)
But you are right, with deepstore + metadata push, a lot of the network traffic is avoided. Also some local IO + CPU is also avoided since the controller won't need to untar/unzip the segments etc.
k
if using metadata based push then it’s really for fault tolerance.. even with thousands of tables one controller is probably enough