Hello Pinot experts, I wonder if anyone here runni...
# general
g
Hello Pinot experts, I wonder if anyone here running Pinot on k8s in production have suggestions for pinot disaster recovery plan from k8s cluster downtime. Assume we are in a environment with multiple k8s clusters running, which of the following would you recommend to let Pinot be resilient to k8s cluster level outage or maintenance: 1. Setting up Pinot cluster across multiple k8s environment with each of them holding one set of data replication. --- (not sure if it is feasible or easy to do) 2. Setting up fully replicated redundant Pinot clusters in different k8s environment, also replicating the data ingestion and anything we did in main cluster. --- (seems costly) 3. Only setting up Pinot running in one k8s cluster, in the case of a k8s cluster outage, rebuild the server, controller, broker in another healthy k8s cluster and let it pick up the old states from kafka, zookeeper, s3, etc. --- (How hard is it for a newly build pinot cluster to inherit and resume the old states?) Any experience sharing on handling this in a prod environment is much appreciated 🙏🏻. Thanks in advance!
m
You could have Pinot deployment across availability zones? What's your cloud provider?
x
current pinot k8s deployment is one cluster per k8s, which means you will have N pinot clusters in your N k8s clusters. This is like fully replicated all-active story.
I would say do 2 replicates per k8s cluster and have a load balancer on top of all pinot clusters
👀 1
btw, what’s the availability of your k8s? if it’s high enough, you can just one pinot cluster on one k8s.
g
Thank you both for the input! Yeah we have couple k8s clusters across different AZs, but most of our existing applications on k8s are stateless and they have active active set up for switching between k8s environments at failover or maintenance event. Thus our infra team currently is not actively support the use case that needs to retain states in a certain k8s cluster, and Pinot might be one of the first major stateful service attempting to run on our k8s environment, so we are trying to figure our a reliable way here.
Also for the question around rebuilding a pinot cluster, I am still curious to know how hard it is? I am specifically interested in knowing if I need to migrate a pinot cluster from one environment to another environment, how will the process look like for getting the new cluster to pick up the old states?
x
Then I would suggest to have one Pinot cluster in one multi-AZ k8s cluster and keep it running always
For Pinot migration, you need to write the table replication tooling
g
Thanks for the info! Do you have some examples around this replication tooling? My understanding is that most of the pinot states and data are stored in zookeeper and deep storage, if we are only migrating server + broker + controller component, what is the data we want to replicate in this case? Can the new service up in another environment just resume its state from the info in zookeeper and deep storage?
You mentioned having multiple pinot clusters and having a load balancer on top of it, I wonder if you have more info about this setup? Basically what do you recommend for managing multi identical pinot clusters in production, like in terms of duplicating the data ingestion, ensuring the consistency across pinot clusters, etc. Also what load balancer are we talking about here? Can a classic http load balancer generally be sufficient? Or we will need something more customize to pinot to route the pinot traffic to multi clusters?
x
there are two ways, 1, recreate tables in new cluster and push all the segments from deep store; 2. do zk replication, but that requires identical pinot cluster setup(including hostname as well, which is possible for k8s cluster migration); then restart entire cluster, considering zk disk replication
👀 1