What s the use case for running with multiple controllers Th Apache Pinot #general

What’s the use case for running with multiple cont...

Ken Krugler

05/21/2021, 7:01 PM

What’s the use case for running with multiple controllers? These are stateless, and don’t have a lot of load (if using something like HDFS for deep storage), right? So is it just zero downtime (assuming you have a LB in front of them) in case one goes down?

Mayank

05/21/2021, 7:01 PM

Fault tolerance for one.

Mayank

05/21/2021, 7:02 PM

Also, when you get into thousands of tables range, then single controller might not cut it.

Ken Krugler

05/21/2021, 7:04 PM

Hmm, I thought the controller load with lots of tables was due to synchronization load, but having two controllers doesn’t distribute that load, right? So what’s the bottleneck with one controller and 1000s of tables?

Mayank

05/21/2021, 7:04 PM

What do you mean by synchronization load?

Ken Krugler

05/21/2021, 7:05 PM

When viewing logs, I see a lot of Helix-related activity (what’s the ideal vs. actual state). But maybe that’s just when we’re redeploying…

Mayank

05/21/2021, 7:06 PM

Controller does run a lot of background jobs (eg retention)

Mayank

05/21/2021, 7:07 PM

But you are right, with deepstore + metadata push, a lot of the network traffic is avoided. Also some local IO + CPU is also avoided since the controller won't need to untar/unzip the segments etc.

Kishore G

05/21/2021, 7:29 PM

if using metadata based push then it’s really for fault tolerance.. even with thousands of tables one controller is probably enough

Open in Slack

Previous Next