:wave: Can I ask one question? I'm looking into ma...
# general
j
👋 Can I ask one question? I'm looking into making Zookeeper more resilient. How would you prepare in case of all of the zookeepers are down? I'm considering some kind of backup for Zookeeper and curious if there's any recommendation especially for zookeepers in Pinot environment.
j
I would say just run more zookeepers. And page when some percent are down. For example we run 5 zk per Pinot cluster. We send a slack alert if 1 is down. We page if 2 are down.
k
also, pay attention to where the dataDir is - ephemeral vs persistent (important if you are running on cloud). dataDir pretty much has everything you need to bring back the cluster.. you can even clone the entire cluster with just the dataDir
j
Gotcha. Yep. Running more zookeepers definitely is an option.
@Kishore G About this - does that mean that dataDir shouldn't be ephemeral?
m
Yes, should be persistent, and 5 instances per cluster is a good number to maintain quorum. Also, I’d recommend running ZK separately from Pinot, so that upgrading Pinot doesn’t unintentionally upgrade ZK.
🤔 1
j
Gotcha. I'll check this out. Thank you so much for the suggestions!
One last question. Is it just as simple as saving data_dir to storage like GCS or is there any tool/software that I need to look into?
k
yes, its as simple as that
👍 1