Hey everyone ! I’m currently investigating Apache...
# general
g
Hey everyone ! I’m currently investigating Apache Pinot, and after reading a good chunk of the documentation, I have a couple of questions. • If this page, it’s said that if you lose all your controller, your cluster will still be able to answer to read queries (but not write queries, obviously). Then, if a new controller is started, it says that the cluster will recover and will be then available again for write queries. That supposed that all cluster states are stored somewhere. I suppose that “somewhere” is Zookeeper ? • Offline servers are responsible to host segments. Let’s say we have only one replica for a given segment, and the offline server hosting it dies. Will Helix discover that and will ask another offline server to download the same segment, in order to make it available again to the brokers ? • Where can I find some information about the resource requirements (mainly CPU / memory) for controllers / brokers / realtime servers / offline servers ? Thanks for your help !
t
Will Helix discover that and will ask another offline server to download the same segment, in order to make it available again to the brokers
I don’t think this happens automatically, when a server has died or left the cluster, then we need to trigger the server rebalancing for that lost table so that the segments are hosted by other servers.
will be then available again for write queries
Rather than seeing it as a write query, I think it is better to see it as new segments won’t be published. Because the servers won’t be able to commit the segment(as controller is involved in segment completion protocol) even though they would be able to ingest data. You can read about the protocal here - https://cwiki.apache.org/confluence/display/PINOT/Consuming+and+Indexing+rows+in+Realtime
g
I don’t think this happens automatically, when a server has died or left the cluster, then we need to trigger the server rebalancing for that lost table so that the segments are hosted by other servers.
OK so I guess it’ll be managed by replicas, then
Rather than seeing it as a write query, I think it is better to see it as new segments won’t be published. Because the servers won’t be able to commit the segment(as controller is involved in segment completion protocol) even though they would be able to ingest data.
Thanks for the clarification. Yeah, when I said “write queries”, what I meant was “the possibility to make a segment available in Pinot”. But as long as the “cold” storage remains available, you can still upload fresh data in it. It’ll just not be available in the Pinot cluster.
k
When a server dies and if you are running on k8s, a new container is created that will download the segments and start serving
g
yeah, but let’s say we have another server available with the correct tags, will it be picked up as the new host for the missing segment ?
k
What’s the name of new server (logical)
g
sorry, I’m not sure to understand you.
also, I’m only starting with Pinot, so I may be missing some important points
k
When a segment is uploaded, we assign it to one or mor servers. That mapping is stored in Helix
That mapping will only change in the following scenario • add more servers and invoke rebalance • Untag a server and invoke rebalance
Untag is a way to say that this server should not host segments for this table anymore
Note that if a server dies, it’s tag is still maintained in Helix/zookeeper

https://youtu.be/HycNRCzkrjg

This video should help
g
I’ll have a look, thanks !