hey my friends I made a change to my zookeeper cluster that Apache Pinot #troubleshooting

hey my friends! I made a change to my zookeeper cl...

Luis Fernandez

03/08/2022, 9:45 PM

hey my friends! I made a change to my zookeeper cluster, that required the entire cluster (3 nodes) to be restarted, for some reason after it got restarted for this change, (we made a change to SSD) we started getting this

Copy code

2022-03-08 21:36:28,798 [myid:1] - INFO  [NIOWorkerThread-1:ZooKeeperServer@1032] - Refusing session request for client /10.12.36.35:34854 as it has seen zxid 0x100000709 our last zxid is 0x0 client must try another server

everywhere in the pinot cluster, same change was applied to our dev env but it didn’t do anything do you all know what may have caused this and how could we recover from this?

Luis Fernandez

03/09/2022, 3:43 PM

bumping this for your thoughts

Kishore G

03/09/2022, 3:51 PM

its not clear what you did here..

Mayank

03/09/2022, 3:53 PM

Seems like Zk cluster was restarted to move from HDD to SSD? Was it a rolling restart?

Luis Fernandez

03/09/2022, 3:56 PM

yes that’s what we did the change the storage class to

storageClass: 'premium-rwo'

I think that restarting all the components will fix it but i wanna learn more about why would this happen.

Luis Fernandez

03/09/2022, 3:57 PM

and yes , usually when this changes go out they are rolling restarts

Luis Fernandez

03/09/2022, 3:58 PM

this same change was applied to our dev cluster and it didn’t go down.

Luis Fernandez

03/09/2022, 3:59 PM

our dev cluster has a table that’s is actively consuming prod like data, but we didn’t have anything in prod yet

Luis Fernandez

03/09/2022, 4:00 PM

sorry and let me clarify that things are not down, but they are just showing that message consistently and i’m not able to access, say the controller UI, so I just wanna understand better that could have happened and why if zookeeper is now up and all good the rest of pinot seems to be complaining

Luis Fernandez

03/09/2022, 4:06 PM

this is exactly what i’m seeing https://stackoverflow.com/questions/45804955/zookeeper-refuses-kafka-connection-from-an-old-client but I just want to understand better how could it have happened

Luis Fernandez

03/09/2022, 4:06 PM

the only hypothesis right now that i have is that it wasn’t a rolling restart and all the nodes went out at the same time?

Kishore G

03/09/2022, 4:07 PM

lets go one step at a time

Kishore G

03/09/2022, 4:07 PM

do you see the data in ZK browser? Idealstate segments, servers etc?

Mayank

03/09/2022, 4:14 PM

From what I read, data is available, however due to the messages being spit for stale zk connection, it is hogging resources and slowing things down

Luis Fernandez

03/09/2022, 4:22 PM

oh sorry, for what i can see in my end by trying to do an ls in the zookeeper client i don’t see the pinot configs so we def lost them with this update

Luis Fernandez

03/09/2022, 4:26 PM

so if i restart the components i probably should have nothing

Mayank

03/09/2022, 8:17 PM

@Daniel Lavoie IIRC you have a recipe for zero downtime upgrade for ZK?

Luis Fernandez

03/09/2022, 8:18 PM

we are currently using whatever we have in the helm chart currently

Daniel Lavoie

03/09/2022, 8:19 PM

Helm is not really adapted to manage zero downtime ZK upgrades.

Luis Fernandez

03/09/2022, 8:19 PM

(in the pinot OSS one)

Daniel Lavoie

03/09/2022, 8:19 PM

Gimme a moment to recap what is going here

👍 1

Daniel Lavoie

03/09/2022, 8:20 PM

Updating the storage class means you lose all the PVC from kubernetes

Luis Fernandez

03/09/2022, 8:20 PM

(also the cluster is back to healthy and prod is consuming we didn’t have any data so it was okay but i wonder what would have happened if we had last all our configs and it was prod prod with data on it already)

Daniel Lavoie

03/09/2022, 8:22 PM

I’ve never tested a rolling upgrade by swapping the storage class. But my recommendation would be as follow:

Daniel Lavoie

03/09/2022, 8:26 PM

• Copy your ZK data with a simple

kubectl cp command /data

on zookeeper. • What is key is that you can’t lose quorom during the operation and that you will have to trash each disk since there is no data migration possible. • Editing the helm deployment directly will drop everything and you will lose data • Solution is to edit the individual PVC of each zookeeper replicas. • Each time you edit 1 PVC, you delete the pod, the PV, wait for it to be created again, wait for zookeeper to report healthy and even run zk cli command to ensure you have quorom. • Repeat for each node. • Update your helm chart persistentClass for consistency. Big disclaimer, this is not something I’ve tested so if you want to experiment, make sure to do it on a dev environment.

🙌 1

Luis Fernandez

03/09/2022, 8:27 PM

wow thank you so much Daniel!

Luis Fernandez

03/09/2022, 8:30 PM

so basically what you don’t want is to lose zookeeper all together right

Luis Fernandez

03/09/2022, 8:30 PM

cause then that may mean that you lose all the zk data

Daniel Lavoie

03/09/2022, 8:31 PM

Yeah, which contains the pointers to all your segments.

Daniel Lavoie

03/09/2022, 8:31 PM

data remains in deepstore, so it’s not lost per say, but you lose all the pointers to it and I’m not sure anything in comunity allows to rebuilt the ZK state from raw data sitting in S3

Luis Fernandez

03/09/2022, 8:31 PM

right, in a typical pinot cluster what counter measures do we have if zookeeper is lost? do people backup this data somehow?

Daniel Lavoie

03/09/2022, 8:32 PM

Don’t lose it 😄 but ZK backup is the countermeasure

Luis Fernandez

03/09/2022, 8:32 PM

😄

Daniel Lavoie

03/09/2022, 8:32 PM

If you can restore ZK data and segments haven’t moved, I’ve seen Pinot recover from that state.

Luis Fernandez

03/09/2022, 8:33 PM

Copy code

data remains in deepstore, so it's not lost per say, but you lose all the pointers to it and I'm not sure anything in comunity allows to rebuilt the ZK state from raw data sitting in S3

oh in startree version you can rebuild zk state like that?

Daniel Lavoie

03/09/2022, 8:34 PM

Not that I am aware off, I was only speaking amongt the Pinot echosystem. Not even sure if it’s technically possible to be honest

Daniel Lavoie

03/09/2022, 8:41 PM

What i know is that you can retrieve the segments and you can re-upload them to another table using controller api I think.

Daniel Lavoie

03/09/2022, 8:42 PM

don’t take this for granted, this is an hypothesis from my end

Luis Fernandez

03/09/2022, 9:40 PM

thank you for all your help Daniel greatly appreciated

Daniel Lavoie

03/09/2022, 9:41 PM

You’re welcome!

Mayank

03/09/2022, 9:45 PM

Thanks @Daniel Lavoie

Open in Slack

Previous Next