Hello. Am trying to explore pinot on kubernetes. H...
# general
v
Hello. Am trying to explore pinot on kubernetes. Have a few questions: 1. The official helm chart marks the broker and controller as statefulset. Any reason i need to keep them as statefulset sets as opposed to just deployments? As in, is there some ordering needed? 2. The controller requires a PV to be attached for volume(ref: https://github.com/apache/incubator-pinot/blob/master/kubernetes/helm/pinot/templates/controller/statefulset.yaml#L71). What is the nature of data being stored here? Or rather, what is the general recommendation for the disk size here? What purpose does this solve? 3. Broker doesn't look like it needs any specific disk. So, extending on 1 and 2 above, why does this need to be a statefulset? What ordering is needed here?
d
Hello Venkatesan!
m
1. We typically follow the ordering
Controller
->
Broker
->
Server
for deployments
d
Well, Pinot is a database, so some components are stateful by nature.
m
2. Controller stores a copy of all the data pushed to the Pinot cluster. So you need to size it accordingly
v
@Mayank @Daniel Lavoie i understand the server being stateful since it stores the data. Just wondering why controller. I think mayank's response answers that. My question on broker still remains. As in, what changes if i make it a deployment instead of a statefulset? Is there a reason? This is largely out of curiosity, since the broker doesn't seem to store any physical data. Thanks for the prompt response 🙂
m
Broker can be considered stateless to some degree. However, note that it is communicating with the server. So if there's a protocol change between server-broker, then deployment ordering matters.
v
Sure...deployment ordering is understandable. My question was more around the kubernetes kind of
deployment
vs
statefulset
. I can still play with pod disruption budgets to ensure minimum replicas, right?
d
stefulset will preserve host identity. With deployment, any host is unique and present itself as a new member.
b
@Venkatesan V It's due to the way the identifiers of the brokers are used in ZK. K8s deployments doesn't guarantee a well-known name to the pod. Here, by running as statefulsets, we preserve hostname or identity
v
I see, that makes sense.
b
Ahh.. @Daniel Lavoie you beat me to it 🙂
d
Pinot controllers tracks who should own what. So identity is important 😛
v
Thank you gentlemen. That was quite useful. Thanks
👍 2
d
@Buchi Reddy At least we are not contradictory
😂 1
🤣
😀 3
x
@Venkatesan V Broker is also stateful with the assignment for which server to connect to in multi-tenancy mode, hence we make them stateful, also less random garbage created on zookeeper
👍 1
v
Brilliant. Quite informative. Thank you
b
@Xiang Fu thanks for additional clarification. Got to know it for the first time.
x
for 2, usually in non-cloud deployment we need to maintain a reference for pinot server to download the segments, so the only place to put it is the controller, or if there is an extra NAS setup, (which you can define as a volume there). In public cloud mode, we can have s3 and google cloud storage or azure datalake as backup, then that volume is not required.
v
Got it. Thanks @Xiang Fu. yes we are on public cloud(AWS) and hence asked the above.
x
cool, then you can set up s3 gcs as the backup for your segments and don't need to mount pvc for controller
v
Yes. That is the plan as i was reading the deep storage aspects of pinot. Thanks once again.
x