Hey all we have be3n experiencing an issue with our k8s depl Apache Pinot #troubleshooting

Hey all, we have be3n experiencing an issue with o...

Dan DC

07/07/2022, 10:13 AM

Hey all, we have be3n experiencing an issue with our k8s deployment. We have a node group to which we deploy zookeeper and all pinot nodes and this node group is scaled down to 0 every n weeks. When the nodes come back up we noticed that some realtime table report error code 305 segments unavailable when queried. We have to rebalance these tables to get them back to normal. I know scaling down the node group to 0 may not be a good policy but I doubt we can change this. My question is whether this behaviour is expected or it can be considered an issueto be fixed? Please let me know if I should raise a github issue for this

Kishore G

07/07/2022, 2:40 PM

How are you scaling this down to 0.. what’s you ZK config.. are the transaction log and snapshots directory configured properly to write to persistent volume?

Dan DC

07/07/2022, 5:58 PM

The is a job that destroys and recreate k8s nodes older than certain age. The script destroys the whole node group so all pinot nodes shut down in one go

Dan DC

07/07/2022, 5:59 PM

We have the deployment configured properly as far as I can tell. ZK data is stored in EBS volumes, so are server and minion volumes

Mayank

07/07/2022, 9:52 PM

If ZK remains intact, then we should be able to scale pinot nodes up and down. It should not result in 305, I’d treat it as an issue to fix.

👍 1

Dan DC

07/08/2022, 8:20 AM

I've not found anything useful in the logs so far but I'll keep searching a bit more, log threshold is info

Open in Slack

Previous Next