Excerpt from log file 2021 08 30 06 16 44 488 ERROR StatusUp Apache Pinot #troubleshooting

Excerpt from log file : 2021/08/30 06:16:44.488 ER...

Sadim Nadeem

08/30/2021, 6:26 AM

Excerpt from log file : 2021/08/30 061644.488 ERROR [StatusUpdateUtil] [HelixTaskExecutor-message_handle_thread] Exception while logging status update org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect() at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.util.StatusUpdateUtil.logMessageStatusUpdateRecord(StatusUpdateUtil.java:348) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.util.StatusUpdateUtil.logError(StatusUpdateUtil.java:400) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:359) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?

Xiang Fu

08/30/2021, 9:27 AM

Can you check if zookeeper is up?

Sadim Nadeem

08/30/2021, 9:37 AM

yes .. zookeeper is in running

Sadim Nadeem

08/30/2021, 9:38 AM

Capture.PNG

Sadim Nadeem

08/30/2021, 9:38 AM

@Mohamed Kashifuddin can you update your findings here

Sadim Nadeem

08/30/2021, 9:38 AM

issue got resolved after helm reinstall but not sure if it will again replicate or not

Sadim Nadeem

08/30/2021, 9:40 AM

as per kashif .. 2hrs before when he did helm reinstall .. issue was still there .. again after doing helm reinstall 30 mins before.. it got resolved

Sadim Nadeem

08/30/2021, 9:40 AM

cc: @Mohamed Hussain @Mohamed Sultan @Shailesh Jha @Arun Kumar

Sadim Nadeem

08/30/2021, 9:48 AM

is it because of release updated by pull policy always may bring some instability on pinot deployment .. can we use a stable build or image or store in it google cloud repository and use and not use pull policy always until sure that the release is stable cc: @Mayank @Xiang Fu @Kishore G @Jackie @Subbu Subramaniam

Xiang Fu

08/30/2021, 9:56 AM

Can you check zookeeper container disk size? If the disk is full?

Xiang Fu

08/30/2021, 9:56 AM

Right now we don’t see the root cause of brokers went down

Sadim Nadeem

08/30/2021, 10:23 AM

sure.. will check again .. right now everything is working fine and zookeeper disk usage is around 20% for all 3 zookeeper pods ..

Sadim Nadeem

08/30/2021, 10:26 AM

still I am not sure what is taking up 20 Gb space in each zookeeper pod .. is it pinot indexing details or something else .. what is recommeneded disk size for zookeeper

Sadim Nadeem

08/30/2021, 10:48 AM

FYI before the auth feature was enabled .. we disabled it to resolve this issue once .. but again same issue replicated even when auth feature was disbled .. helm uninstall and reinstall seems to resolve broker pods crashing issue

Mayank

08/30/2021, 12:05 PM

Seems like the ZK snapshots might not be cleaned up?

Sadim Nadeem

08/30/2021, 12:14 PM

how to clean it

Xiang Fu

08/30/2021, 4:40 PM

hmm, I think zk disk is fine

Xiang Fu

08/30/2021, 4:40 PM

But Pinot cannot connect to zk

Xiang Fu

08/30/2021, 4:41 PM

How do you enable the auth for zookeeper?

Sadim Nadeem

08/30/2021, 6:01 PM

auth here means auth while connecting to pinot controller UI .. means access to pinot controller UI with different levels of access .. I guess this is a latest feature in pinot controller UI ..

Sadim Nadeem

08/30/2021, 6:03 PM

right now from last 10 hours .. no issue in pinot and its working fine .. will definitely update if issue regression happens

Sadim Nadeem

08/30/2021, 6:04 PM

helm uninstall and reinstall seems to resolve broker pods crashing issue

Sadim Nadeem

08/30/2021, 7:03 PM

I think restarting broker pods alone would resolve this issue .. I think its the regression of same issue we faced one month before @Xiang Fu

Xiang Fu

08/30/2021, 8:05 PM

do you have the logs of broker when it’s crashing?

Sadim Nadeem

08/31/2021, 7:53 AM

this is the log i got using kubectl logs pinot-broker-o -n mynamespace .. its a partial log .. the complete log I am checking if its avialable or lost

logs.txt

Sadim Nadeem

09/01/2021, 6:10 PM

what are the things that consume zookeeper disk space .. I can see disk space utilization increasing by 1`-2% every day .. in next 1 month it will get full .. so was curious to know what are the things that consumes zk space in pinot @Xiang Fu @Jackie @Mayank

Open in Slack

Previous Next