Hi, I would like to setup a Pinot cluster with mul...
# general
a
Hi, I would like to setup a Pinot cluster with multiple controllers,servers and brokers on different hosts. I can see in document that controller should have a shared volume. Do servers, brokers, controllers running on different hosts should be reachable to each other?
m
Brokers and servers need network connectivity for query scatter/gather (between broker and server)
One broker does not need to know about other brokers
Same for servers
a
All brokers should be reachable to all server. What abt connection with controller?
s
Brokers need to reach the servers that host the tables that are served in the broker. In general, Pinot depends on Helix, so all nodes need to reach zookeeper and vice versa. I suppose by "reach" you mean setting up ip tables to block each other? It is good if the controllers can reach brokers. If not, your query console will not work. It is also good if controllers can reach servers otherwise some debug commands and features wont work. It is required that servers reach controllers. What exactly is your reachability constraint?
a
I meant network connectivity and Mayank clarified to me as you also mentioned. Between controllers only file system sharing, and controllers should be reachable to both servers and brokers, servers should be reachable to brokers
k
Not sure about controllers needing a shared file system - that depends on how you’ve configured your segment deep store. And I think having multiple controllers using a shared fs for deep store could be problematic (which would be the case if you were pushing segments to the controller, vs pushing metadata).
s
Linkedin runs pinot in production with multiple controllers sharing a common nfs (and pushing data through controllers, yes).
k
Hi @User - so you rely on each controller getting a distinct set of segments (by name), so they don’t step on each other when writing data?
k
Arpit, if you have the option, my recommendation is to avoid the linkedin model and use metadata uri based push (suggested by Ken)
s
@User no. each controller can receive messages for any table. Not sure why you mention that they need to get a distinct set of segments by name. Are you thinking of pushing two segments with the same name (but different contents) to two different controllers and somehow expecting a consistent result?
a
@User I am very new to Pinot and I am not aware of metadata push vs data push model. For a start, I was tryjng to setup a simple multi node cluster with 1 controller, server A, broker A on host 1 having network connectivity with another host 2 running server B and broker B. I was getting some error with above setup but what I understood is that it should be possible. My plan is to use HDFS for deep storage once I make above setup work
k
@User in a past life we had a painful bug, due to two servers (behind a LB) that used a shared disk. The LB was configured to auto-retry to the other server if the initial request took too long, but that timeout was sometimes too short, so then we’d wind up with processes on two different servers stepping on each other’s data (writing/updating the same file).
The deep scars from that experience made me (probably too) afraid of having multiple servers using a shared disk, without strict partitioning of the data in the file system.