hey my friends! I made a change to my zookeeper cl...
# troubleshooting
l
hey my friends! I made a change to my zookeeper cluster, that required the entire cluster (3 nodes) to be restarted, for some reason after it got restarted for this change, (we made a change to SSD) we started getting this
Copy code
2022-03-08 21:36:28,798 [myid:1] - INFO  [NIOWorkerThread-1:ZooKeeperServer@1032] - Refusing session request for client /10.12.36.35:34854 as it has seen zxid 0x100000709 our last zxid is 0x0 client must try another server
everywhere in the pinot cluster, same change was applied to our dev env but it didn’t do anything do you all know what may have caused this and how could we recover from this?
bumping this for your thoughts
k
its not clear what you did here..
m
Seems like Zk cluster was restarted to move from HDD to SSD? Was it a rolling restart?
l
yes that’s what we did the change the storage class to
storageClass: 'premium-rwo'
I think that restarting all the components will fix it but i wanna learn more about why would this happen.
and yes , usually when this changes go out they are rolling restarts
this same change was applied to our dev cluster and it didn’t go down.
our dev cluster has a table that’s is actively consuming prod like data, but we didn’t have anything in prod yet
sorry and let me clarify that things are not down, but they are just showing that message consistently and i’m not able to access, say the controller UI, so I just wanna understand better that could have happened and why if zookeeper is now up and all good the rest of pinot seems to be complaining
this is exactly what i’m seeing https://stackoverflow.com/questions/45804955/zookeeper-refuses-kafka-connection-from-an-old-client but I just want to understand better how could it have happened
the only hypothesis right now that i have is that it wasn’t a rolling restart and all the nodes went out at the same time?
k
lets go one step at a time
do you see the data in ZK browser? Idealstate segments, servers etc?
m
From what I read, data is available, however due to the messages being spit for stale zk connection, it is hogging resources and slowing things down
l
oh sorry, for what i can see in my end by trying to do an ls in the zookeeper client i don’t see the pinot configs so we def lost them with this update
so if i restart the components i probably should have nothing
m
@Daniel Lavoie IIRC you have a recipe for zero downtime upgrade for ZK?
l
we are currently using whatever we have in the helm chart currently
d
Helm is not really adapted to manage zero downtime ZK upgrades.
l
(in the pinot OSS one)
d
Gimme a moment to recap what is going here
👍 1
Updating the storage class means you lose all the PVC from kubernetes
l
(also the cluster is back to healthy and prod is consuming we didn’t have any data so it was okay but i wonder what would have happened if we had last all our configs and it was prod prod with data on it already)
d
I’ve never tested a rolling upgrade by swapping the storage class. But my recommendation would be as follow:
• Copy your ZK data with a simple
kubectl cp command /data
on zookeeper. • What is key is that you can’t lose quorom during the operation and that you will have to trash each disk since there is no data migration possible. • Editing the helm deployment directly will drop everything and you will lose data • Solution is to edit the individual PVC of each zookeeper replicas. • Each time you edit 1 PVC, you delete the pod, the PV, wait for it to be created again, wait for zookeeper to report healthy and even run zk cli command to ensure you have quorom. • Repeat for each node. • Update your helm chart persistentClass for consistency. Big disclaimer, this is not something I’ve tested so if you want to experiment, make sure to do it on a dev environment.
🙌 1
l
wow thank you so much Daniel!
so basically what you don’t want is to lose zookeeper all together right
cause then that may mean that you lose all the zk data
d
Yeah, which contains the pointers to all your segments.
data remains in deepstore, so it’s not lost per say, but you lose all the pointers to it and I’m not sure anything in comunity allows to rebuilt the ZK state from raw data sitting in S3
l
right, in a typical pinot cluster what counter measures do we have if zookeeper is lost? do people backup this data somehow?
d
Don’t lose it 😄 but ZK backup is the countermeasure
l
😄
d
If you can restore ZK data and segments haven’t moved, I’ve seen Pinot recover from that state.
l
Copy code
data remains in deepstore, so it's not lost per say, but you lose all the pointers to it and I'm not sure anything in comunity allows to rebuilt the ZK state from raw data sitting in S3
oh in startree version you can rebuild zk state like that?
d
Not that I am aware off, I was only speaking amongt the Pinot echosystem. Not even sure if it’s technically possible to be honest
What i know is that you can retrieve the segments and you can re-upload them to another table using controller api I think.
don’t take this for granted, this is an hypothesis from my end
l
thank you for all your help Daniel greatly appreciated
d
You’re welcome!
m
Thanks @Daniel Lavoie