Capture3.PNG
# troubleshooting
s
Capture3.PNG
@Xiang Fu
what is this sessionId does not match about cc: @Mayank @Xiang Fu @Kishore G @Jackie @Subbu Subramaniam
x
it comes from the process restart
the previous session is still in zookeeper, which will be cleaned after zookeeper session timeout
s
how to resolve this issue ..
j
This is normal during broker restart/reconnect, and you may ignore them
s
i think restarting broker alone could have resolve the broker crashloopbackoff issue .. the mistake we were making was restarting everything including controller , server .. I guess we got the trick now ..
or the other solution would be increasing the health check timeout .. not sure how to increase it
we faced the same issue around a month ago .. I think same issue faced again 1 month later today but forgot the resolution .. now I can recollect
x
I think the issue is still why broker will fail, if you can provide the logs of failed broker, it can help us find the issue
s
please check if it help[s.. its a partial log .. will check if can get the whole log .. most probably its not present
x
From the log, seems the issue is on zookeeper
can you check the zookeeper logs for the same time period?
s
ok let me check
x
also, is there any zookeeper restart or downtime during that time?
s
no .. zookeeper never restarted
you can check here .. most of the time zk has 0 restarts
x
there is one restart:
s
yes
but we have 3 zk nodes for high availability
so restart of one pod shouldnt cause much trouble
x
true, please try to correlate this issue with a zookeeper log if you found it next time
also you can use kubectl cmd to delete broker pod to let it restart
s
sure @Xiang Fu
what are the things that consume zookeeper disk space .. I can see disk space utilization increasing by 1`-2% every day .. in next 1 month it will get full .. so was curious to know what are the things that consumes zk space in pinot @Xiang Fu @Jackie @Mayank
m
I am guessing it is the ZK snapshots (or logs)?
s
is there some way to get rid of old logs automatically from disk so that zk dont crashes in future bcz of disk space issue
m
I am guessing there should be. You want to check ZK docs?
s