Luis Fernandez
03/08/2022, 9:45 PM2022-03-08 21:36:28,798 [myid:1] - INFO [NIOWorkerThread-1:ZooKeeperServer@1032] - Refusing session request for client /10.12.36.35:34854 as it has seen zxid 0x100000709 our last zxid is 0x0 client must try another server
everywhere in the pinot cluster, same change was applied to our dev env but it didn’t do anything do you all know what may have caused this and how could we recover from this?Luis Fernandez
03/09/2022, 3:43 PMKishore G
Mayank
Luis Fernandez
03/09/2022, 3:56 PMstorageClass: 'premium-rwo'
I think that restarting all the components will fix it but i wanna learn more about why would this happen.Luis Fernandez
03/09/2022, 3:57 PMLuis Fernandez
03/09/2022, 3:58 PMLuis Fernandez
03/09/2022, 3:59 PMLuis Fernandez
03/09/2022, 4:00 PMLuis Fernandez
03/09/2022, 4:06 PMLuis Fernandez
03/09/2022, 4:06 PMKishore G
Kishore G
Mayank
Luis Fernandez
03/09/2022, 4:22 PMLuis Fernandez
03/09/2022, 4:26 PMMayank
Luis Fernandez
03/09/2022, 8:18 PMDaniel Lavoie
03/09/2022, 8:19 PMLuis Fernandez
03/09/2022, 8:19 PMDaniel Lavoie
03/09/2022, 8:19 PMDaniel Lavoie
03/09/2022, 8:20 PMLuis Fernandez
03/09/2022, 8:20 PMDaniel Lavoie
03/09/2022, 8:22 PMDaniel Lavoie
03/09/2022, 8:26 PMkubectl cp command /data
on zookeeper.
• What is key is that you can’t lose quorom during the operation and that you will have to trash each disk since there is no data migration possible.
• Editing the helm deployment directly will drop everything and you will lose data
• Solution is to edit the individual PVC of each zookeeper replicas.
• Each time you edit 1 PVC, you delete the pod, the PV, wait for it to be created again, wait for zookeeper to report healthy and even run zk cli command to ensure you have quorom.
• Repeat for each node.
• Update your helm chart persistentClass for consistency.
Big disclaimer, this is not something I’ve tested so if you want to experiment, make sure to do it on a dev environment.Luis Fernandez
03/09/2022, 8:27 PMLuis Fernandez
03/09/2022, 8:30 PMLuis Fernandez
03/09/2022, 8:30 PMDaniel Lavoie
03/09/2022, 8:31 PMDaniel Lavoie
03/09/2022, 8:31 PMLuis Fernandez
03/09/2022, 8:31 PMDaniel Lavoie
03/09/2022, 8:32 PMLuis Fernandez
03/09/2022, 8:32 PMDaniel Lavoie
03/09/2022, 8:32 PMLuis Fernandez
03/09/2022, 8:33 PMdata remains in deepstore, so it's not lost per say, but you lose all the pointers to it and I'm not sure anything in comunity allows to rebuilt the ZK state from raw data sitting in S3
oh in startree version you can rebuild zk state like that?Daniel Lavoie
03/09/2022, 8:34 PMDaniel Lavoie
03/09/2022, 8:41 PMDaniel Lavoie
03/09/2022, 8:42 PMLuis Fernandez
03/09/2022, 9:40 PMDaniel Lavoie
03/09/2022, 9:41 PMMayank