Hey all, we have be3n experiencing an issue with o...
# troubleshooting
d
Hey all, we have be3n experiencing an issue with our k8s deployment. We have a node group to which we deploy zookeeper and all pinot nodes and this node group is scaled down to 0 every n weeks. When the nodes come back up we noticed that some realtime table report error code 305 segments unavailable when queried. We have to rebalance these tables to get them back to normal. I know scaling down the node group to 0 may not be a good policy but I doubt we can change this. My question is whether this behaviour is expected or it can be considered an issueto be fixed? Please let me know if I should raise a github issue for this
k
How are you scaling this down to 0.. what’s you ZK config.. are the transaction log and snapshots directory configured properly to write to persistent volume?
d
The is a job that destroys and recreate k8s nodes older than certain age. The script destroys the whole node group so all pinot nodes shut down in one go
We have the deployment configured properly as far as I can tell. ZK data is stored in EBS volumes, so are server and minion volumes
m
If ZK remains intact, then we should be able to scale pinot nodes up and down. It should not result in 305, I’d treat it as an issue to fix.
👍 1
d
I've not found anything useful in the logs so far but I'll keep searching a bit more, log threshold is info