Excerpt from log file : 2021/08/30 06:16:44.488 ER...
# troubleshooting
s
Excerpt from log file : 2021/08/30 061644.488 ERROR [StatusUpdateUtil] [HelixTaskExecutor-message_handle_thread] Exception while logging status update org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect()         at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.util.StatusUpdateUtil.logMessageStatusUpdateRecord(StatusUpdateUtil.java:348) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.util.StatusUpdateUtil.logError(StatusUpdateUtil.java:400) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:359) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff]         at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?
x
Can you check if zookeeper is up?
s
yes .. zookeeper is in running
Capture.PNG
@Mohamed Kashifuddin can you update your findings here
issue got resolved after helm reinstall but not sure if it will again replicate or not
as per kashif .. 2hrs before when he did helm reinstall .. issue was still there .. again after doing helm reinstall 30 mins before.. it got resolved
cc: @Mohamed Hussain @Mohamed Sultan @Shailesh Jha @Arun Kumar
is it because of release updated by pull policy always may bring some instability on pinot deployment .. can we use a stable build or image or store in it google cloud repository and use and not use pull policy always until sure that the release is stable cc: @Mayank @Xiang Fu @Kishore G @Jackie @Subbu Subramaniam
x
Can you check zookeeper container disk size? If the disk is full?
Right now we don’t see the root cause of brokers went down
s
sure.. will check again .. right now everything is working fine and zookeeper disk usage is around 20% for all 3 zookeeper pods ..
still I am not sure what is taking up 20 Gb space in each zookeeper pod .. is it pinot indexing details or something else .. what is recommeneded disk size for zookeeper
FYI before the auth feature was enabled .. we disabled it to resolve this issue once .. but again same issue replicated even when auth feature was disbled .. helm uninstall and reinstall seems to resolve broker pods crashing issue
m
Seems like the ZK snapshots might not be cleaned up?
s
how to clean it
x
hmm, I think zk disk is fine
But Pinot cannot connect to zk
How do you enable the auth for zookeeper?
s
auth here means auth while connecting to pinot controller UI .. means access to pinot controller UI with different levels of access .. I guess this is a latest feature in pinot controller UI ..
right now from last 10 hours .. no issue in pinot and its working fine .. will definitely update if issue regression happens
helm uninstall and reinstall seems to resolve broker pods crashing issue
I think restarting broker pods alone would resolve this issue .. I think its the regression of same issue we faced one month before @Xiang Fu
x
do you have the logs of broker when it’s crashing?
s
this is the log i got using kubectl logs pinot-broker-o -n mynamespace .. its a partial log .. the complete log I am checking if its avialable or lost
what are the things that consume zookeeper disk space .. I can see disk space utilization increasing by 1`-2% every day .. in next 1 month it will get full .. so was curious to know what are the things that consumes zk space in pinot @Xiang Fu @Jackie @Mayank