Hi.Team. Some pino-servers fail to connect to Zook...
# general
c
Hi.Team. Some pino-servers fail to connect to Zookeeper and do nothing. But the daemon process is still running. I built a cluster with 6 pino-servers on premises. Zookeeper is 3 generations old. The cluster has 1 offline table (1 TB) and 2 upsert realtime tables(30 GB ~). This is the last log written by the pinot-server. The pino-server is dead and there are no logs for several hours.
Copy code
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final clean up for message : 723f5e30-1ffa-447b-a61a-72482e5d6ab4
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final clean up for message : 47521252-6e9f-49d5-aaf0-1e1169b65cda
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:finalCleanup:390 Error to final clean up for message : aa0ae67b-6b92-46dd-b19d-b6bda195af18
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:call:194 Exception after executing a message, msgId: 723f5e30-1ffa-447b-a61a-72482e5d6ab4org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
        at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:call:194 Exception after executing a message, msgId: aa0ae67b-6b92-46dd-b19d-b6bda195af18org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
        at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2022-05-27 01:32:43.487 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTask:call:194 Exception after executing a message, msgId: 47521252-6e9f-49d5-aaf0-1e1169b65cdaorg.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
        at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:172) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2022-05-27 01:32:43.494 WARN  [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181> is not connected, wait for 10000ms.
2022-05-27 01:32:43.494 WARN  [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181> is not connected, wait for 10000ms.
2022-05-27 01:32:43.494 WARN  [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181> is not connected, wait for 10000ms.
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:376 zkClient is not connected after waiting 10000ms., clusterName: PinotPoC_2, zkAddress: <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>
2022-05-27 01:32:57.146 ERROR [HelixTaskExecutor-message_handle_thread] org.apache.helix.util.StatusUpdateUtil:logMessageStatusUpdateRecord:352 Exception while logging status update
org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster PinotPoC_2
        at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:378) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.util.StatusUpdateUtil.logMessageStatusUpdateRecord(StatusUpdateUtil.java:348) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.util.StatusUpdateUtil.logError(StatusUpdateUtil.java:392) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:195) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-depe
x
is zk up?
can you check if zk disk is full or not ?
from the logs seems that pinot cannot connect to zookeeper
you may need to set up retention for zookeeper
c
@Xiang Fu zookeeper is up. disk is not full. similar to this bug. https://github.com/apache/pinot/issues/8352
@Xiang Fu Zookeeper is ok, but pino-server stays stuck.
@Xiang Fu This cluster started crashing after we started collecting data into two upsert realtime tables. Wouldn't it be possible to use two upsert realtime tables?
x
I don’t think that’s a problem
cc: @Jackie in case if you got time
have you tried restart all zookeeper/controller/broker/servers?
c
@Xiang Fu Yes restarted but didn't solve it.
@Xiang Fu Here are more logs. After about 10 minutes of restarting the pino-server, the Zookeeper session expires and the connection is lost. It retries but keeps failing.
Copy code
...

2022-05-27 17:45:20.357 INFO  [<http://ZkClient-EventThread-82-pay-hadoop-test1.dakao.io:2181,pay-hadoop-test2.dakao.io:2181,pay-hadoop-test3.dakao.io:2181|ZkClient-EventThread-82-pay-hadoop-test1.dakao.io:2181,pay-hadoop-test2.dakao.io:2181,pay-hadoop-test3.dakao.io:2181>] org.apache.helix.manager.zk.CallbackHandler:subscribeForChanges:563 Subscribing to path:/PinotPoC_2/INSTANCES/Server_pinot-poc-server6.company.com_8001/MESSAGES took:50945

...

2022-05-27 17:45:54.219 INFO  [CallbackHandler-AsycSubscribe-Singleton] org.apache.helix.manager.zk.CallbackHandler:subscribeForChanges:563 Subscribing to path:/PinotPoC_2/INSTANCES/Server_pinot-poc-server6.company.com_8001/MESSAGES took:33862

...

2022-05-27 17:46:28.048 DEBUG [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:readResponse:923 Reading reply sessionid:0x30106a604ba001c, packet:: clientPath:null serverPath:null finished:false header:: 57006,5  replyHeader:: 57006,17179990511,0  request:: '/PinotPoC_2/INSTANCES/Server_pinot-poc-server6.company.com_8001/CURRENTSTATES/30106a604ba001c/mydata_card_REALTIME,#7ba202022696422203a20226d79646174615f636172645f5245414c54494d45222ca20202273696d706c654669656c647322203a207ba20202020224255434b45545f53495

...

2022-05-27 17:46:28.049 DEBUG [pool-1-thread-3] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting for keeper state SyncConnected
2022-05-27 17:46:28.051 DEBUG [pool-1-thread-3] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State is SyncConnected
2022-05-27 17:46:28.051 WARN  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:run:1190 Client session timed out, have not heard from server in 20948ms for sessionid 0x30106a604ba001c
2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting for keeper state SyncConnected
2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting for keeper state SyncConnected
2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State is SyncConnected
2022-05-27 17:46:28.051 DEBUG [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State is SyncConnected
2022-05-27 17:46:28.051 INFO  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:run:1238 Client session timed out, have not heard from server in 20948ms for sessionid 0x30106a604ba001c, closing socket connection and attempting reconnect

...

2022-05-27 17:46:44.867 WARN  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk2.company.com:2181|pinot-poc-zk2.company.com:2181>)] org.apache.zookeeper.ClientCnxn:onConnected:1380 Unable to reconnect to ZooKeeper service, session 0x30106a604ba001c has expired
2022-05-27 17:46:44.867 DEBUG [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:process:641 Received event: WatchedEvent state:Expired type:None path:null
2022-05-27 17:46:44.867 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-27 17:46:44.868 DEBUG [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:send:92 New event: ZkEvent[State changed to Expired sent to org.apache.helix.manager.zk.ZKHelixManager@7c8fecd9]
2022-05-27 17:46:44.868 INFO  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk2.company.com:2181|pinot-poc-zk2.company.com:2181>)] org.apache.zookeeper.ClientCnxn:run:1236 Unable to reconnect to ZooKeeper service, session 0x30106a604ba001c has expired, closing socket connection
2022-05-27 17:46:44.870 DEBUG [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkConnection:reconnect:94 Creating new ZookKeeper instance to reconnect to <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>.
2022-05-27 17:46:49.097 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.zookeeper.ZooKeeper:<init>:868 Initiating client connection, connectString=<http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181> sessionTimeout=30000 watcher=org.apache.helix.manager.zk.ZkClient@4cebec82
2022-05-27 17:46:49.097 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.zookeeper.ClientCnxnSocket:initProperties:237 jute.maxbuffer value is 4194304 Bytes
2022-05-27 17:46:49.097 DEBUG [<http://ZkClient-EventThread-82-pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|ZkClient-EventThread-82-pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting for keeper state SyncConnected

...

2022-05-27 17:46:53.314 DEBUG [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.SaslServerPrincipal:getServerPrincipal:80 Canonicalized address to <http://pinot-poc-zk1.company.com|pinot-poc-zk1.company.com>
2022-05-27 17:46:53.314 INFO  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:logStartConnect:1112 Opening socket connection to server <http://pinot-poc-zk1.company.com/10.62.30.174:2181|pinot-poc-zk1.company.com/10.62.30.174:2181>. Will not attempt to authenticate using SASL (unknown error)
2022-05-27 17:46:53.315 INFO  [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:primeConnection:959 Socket connection established, initiating session, client: /10.62.34.74:58040, server: <http://pinot-poc-zk1.company.com/10.62.30.174:2181|pinot-poc-zk1.company.com/10.62.30.174:2181>
2022-05-27 17:46:53.315 DEBUG [Start a Pinot [SERVER]-SendThread(<http://pinot-poc-zk1.company.com:2181|pinot-poc-zk1.company.com:2181>)] org.apache.zookeeper.ClientCnxn:primeConnection:1027 Session establishment request sent on <http://pinot-poc-zk1.company.com/10.62.30.174:2181|pinot-poc-zk1.company.com/10.62.30.174:2181>
2022-05-27 17:46:53.316 DEBUG [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1119 Waiting for keeper state SyncConnected
2022-05-27 17:46:53.316 WARN  [HelixTaskExecutor-message_handle_thread] org.apache.helix.manager.zk.ZKHelixManager:checkConnected:369 zkClient to <http://pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181> is not connected, wait for 10000ms.

...

2022-05-27 17:46:53.317 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
2022-05-27 17:46:53.317 DEBUG [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:send:92 New event: ZkEvent[State changed to SyncConnected sent to org.apache.helix.manager.zk.ZKHelixManager@7c8fecd9]
2022-05-27 17:46:53.318 DEBUG [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:process:703 Leaving process event
2022-05-27 17:46:53.318 DEBUG [<http://ZkClient-EventThread-82-pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181|ZkClient-EventThread-82-pinot-poc-zk1.company.com:2181,pinot-poc-zk2.company.com:2181,pinot-poc-zk3.company.com:2181>] org.apache.helix.manager.zk.zookeeper.ZkClient:waitForKeeperState:1129 State is SyncConnected
j
@coco Do you see other suspicious ERROR/WARN logs besides the ZK disconnection? We want to check if server throws exception after reconnecting to the ZK. You may search
zookeeper state changed
in the log to find the ZK connection state changes
c
@Jackie The first attached error log of this thread is the last log written by the server. The server logged that log and stopped leaving no more logs. Zookeeper's /PinotPoC_2/INSTANCES/Server_[server-address]_8001/MESSAGES node has thousands of messages.
@Jackie We dropped all tables to restore the cluster. In the deleted table, 16,000 segments for offline tables and 3,000 segments for real-time tables were deleted per table. All tables were dropped and the cluster was restored.
@Jackie https://docs.pinot.apache.org/basics/getting-started/frequent-questions/ingestion-faq I read this article and thought that segments that are too small could cause problems. Our segment compressed size was 35MB for offline table and 2MB for realtime table. We only created realtime tables and set "realtime.segment.flush.threshold.segment.size": "450M".
@Jackie Segments of the realtime table were created incrementally from 2MB to 220MB. Our cluster has only 48 segments. We created an offline table, saved and deleted several GB of data, and ran a test by restarting the Pino-Server. After the test, the Pino-Server changed back to the Dead state and is not returning to the 'Good' state. I am checking the pino-server log.
@Jackie Filtered the 'zookeeper state changed' log line with the 'grep' command in the log file.
Copy code
2022-05-30 22:36:17.641 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Disconnected)
2022-05-30 22:37:28.445 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-30 22:37:43.594 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Closed)
2022-05-30 22:38:54.142 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
2022-05-30 22:41:00.309 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Disconnected)
2022-05-30 22:43:17.117 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-30 22:44:17.613 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Closed)
2022-05-30 22:45:28.276 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
2022-05-30 22:47:09.541 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Disconnected)
2022-05-30 22:47:40.941 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-30 22:48:42.670 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
2022-05-30 22:50:40.515 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Disconnected)
2022-05-30 22:50:51.174 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-30 22:51:01.558 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Closed)
2022-05-30 22:51:12.016 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
2022-05-30 22:52:54.425 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Disconnected)
2022-05-30 22:54:02.804 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (Expired)
2022-05-30 22:54:13.491 INFO  [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClient:processStateChanged:836 zookeeper state changed (SyncConnected)
@Jackie Error log after going to SyncConnected state.
j
@coco From the log you shared, I can see several exceptions related to not able to write to ZK, and the server keeps disconnected from ZK. I'd suggest checking the zookeeper instances and see if it has issues such as disk full. Per the server log, I don't think the issue is from the server side
c
@Jackie We checked the Zookeeper server, we think it's not a Zookeeper server issue. We analyzed the logs a bit more. Looking at the log, the session has expired. You can see in the Zookeeper log that the connection is closed after the last ping request of the session. (Some text may have been excluded from the log line.) -- pinot server log 2022-05-31 183349.731 zookeeper Session establishment ... 2022-05-31 183653.358 org.apache.kafka.clients.FetchSessionHandlerhandlerError481 Error Sending fetch request, kafka.common.errors.DisconnectException: null 2022-05-31 183653.358 INFO [account__0__26__20220531T0837Z] org.apache.kafka.clients.FetchSessionHandlerhandlerError481 [Consumer clientId=consumer-null-2, groupId=null] Error sending fetch request (sessionId=173595688, epoch-146) to node 0: org.apache.kafka.common.errors.DisconnectException: null 2022-05-31 183706.478 INFO [account__0__26__20220531T0837Z] org.apache.kafka.clients.consumer.KafkaConsumerseek1582 [Consumer clinetId=consumer-null-2, groupId=null] Seeking to offset 309388099 for partition test.realtime.transaction-account.v1-0 2022-05-31 183751.245 WARN [Start a Pinot [SERVER]-SendThread(pinot-poc-zk3.company.com:2181)] org.apache.zookeeper.ClientCnxnrun1190 Client session timed out, have not heard form server in 32235ms for sessionid 0x101024634ef000b 2022-05-31 183751.246 INFO [Start a Pinot [SERVER]-SendThread(pinot-poc-zk3.company.com:2181)] org.apache.zookeeper.ClientCnxnrun1238 Client session timed out, have not heard form server in 32235ms for sessionid 0x101024634ef000b, closing socket connection and attempting reconnect 2022-05-31 183810:601 INFO [Start a Pinot [SERVER]-EventThread] org.apache.helix.manager.zk.zookeeper.ZkClientprocessStateChanged836 zookeeper state changed (Disconnected) ... 2022-05-31 184052.561 INFO [account__0__26__20220531T0837Z] org.apache.kafka.clients.FetchSessionHandlerhandleError481 [Consumer clientId=consumer-null-2, groupId=null] Error sending fetch request (sessionId=173595688, epoch-INITIAL) to node 0: org.apache.kafka.common.errors.TimeoutException: Failed to send request after 30000 ms. -- zookeeper log 2022-05-31 183706,477 Processing request:: sessionid:0x101024634ef000b type:ping -- last ping request time 2022-05-31 183706,477 Processing request:: sessionid:0x101024634ef000b type:getData ... /PinotPoC_2/IDEALSTATES/account_REALTIME -- last getData request, ... 2022-05-31 183737.311 Committing request sessionid:0x101024634ef000b type:closeSession 2022-05-31 183737.311 Deleting ephemeral node /PinotPoC_2/LIVEINSTANCES/Server_pinot-poc-server6.company.com_8001 for session 0x101024634ef000b 2022-05-31 183737.311 Closed socket connection for client
p
We had this issue before, seemed to be caused by upsert and realtime pinot servers resulting in the pinot server running out of memory and becoming unresponsive/disconnected from ZK. We ditched upsert, added 30 days retention and haven't seen this issue since. See if you can find an OutOfMemoryError caught from a thread and logged in the log.
c
@Peter Pringle Thanks for the comment. Yes, we did some tests and sometimes OutOfMemory (Pino Server 16G) came out. We tested with memory up to 31G, and OutOfMemory is not visible, but Zookeeper session expiration and subsequent connection failures are still occurring. If this problem is due to Realtime tables in Upsert mode, we are sad. We need upsert mode for our business case.
j
@Peter Pringle @coco We are working on enhancing the upsert feature to work on off-heap memory (also can flush to disk) to avoid consuming too much heap memory and causing issues. It is planned to be done by end of July
@coco Let's first figure out what is causing the ZK disconnect. Do you see heavy GC around that time? One possibility is that server is under heavy GC and freeze, and fail the heartbeat check from the ZK
c
@Jackie This is the g1gc log of the pino-server. A full gc of 6s is occurring. This is what I found while analyzing the Pino-Server logs. The pino-server starts and attempts to bring the segment to the ONLINE state. Until the zookeeper session expires, HelixTAskExecutor:scheduleTask saves messages to the pool but does not process them. poo size = 1, active threads = 1, queued tasks = 0, completed tasks = 0 poo size = 2, active threads = 2, queued tasks = 0, completed tasks = 0 ... poo size = 40, active threads = 40, queued tasks = 0, completed tasks = 0 poo size = 40, active threads = 40, queued tasks = 1, completed tasks = 0 ... poo size = 40, active threads = 40, queued tasks = 20, completed tasks = 0
@Jackie This is the last log in the Pino-Server log for messages that are only scheduled and not completed. 2022-05-31 183352.547 INFO [HelixTaskExecutor-message_handle_thread] org.apache.pinot.segment.local.indexsegment.immutable.ImmutableSegmentLoaderload209 Successfully loaded segment: account__0__20__20220531T0510Z with SegmentDirectory 2022-05-31 183352.548 INFO [HelixTaskExecutor-message_handle_thread] org.apache.pinot.segment.local.upsert.PartitionUpsertMetadataManageraddSegment97 Adding upsert metadata for segment: account__0__20__20220531T0510Z -- another message 2022-05-31 183439.273 INFO [HelixTaskExecutor-message_handle_thread] org.apache.pinot.server.starter.helix.HelixInstanceDataManageraddRealtimeSegment171 Added segment: card__0__18__20220530T0206Z to table: card_REALTIME 2022-05-31 183439.291 INFO [HelixTaskExecutor-message_handle_thread] org.apache.helix.messaging.handling.HelixTaskcall201 Message: 900f278c-2d3c-4a34-b1e6-2f365122ef82 (parent: null) handling task for card_REALTIME:card__0__18__20220530T0206Z completed at: 1653989679291, results: true. FrameworkTime: 5 ms; HandlerTime: 45395 ms.
j
Based on the log, the problem should be caused by upsert metadata using too much heap memory
How much data do you load on each server? What's the approximate cardinality of the primary key?
c
@Jackie What logs tell me that upsert metadata using too much heap memory is causing the problem? PK is a string type and has a length of approximately 30. Every Row in a table is unique by PK. (cardinality = totakDocs) The size of the table is 6G and the number of rows is about 31 million. (3 replicas, 2 tables) The problem occurred when the memory of the pino-server was 16G (or maybe 24G) and the number of rows was over 50 million.
j
Based on the log, server is not able to load segments because it is short on memory, and runs into infinite full GCs and not able to reclaim any memory
How many servers do you have? I'm trying to match the numbers. 31 million keys of size 30 shouldn't use all 16G memory
c
@Jackie We tested it again. The pino server is set to Xmx 24G, and there are three. We divided the tenants that service upsert mode real-time tables and tables that do not use upsert mode into two tenants. Each tenant has 3 Pino Servers (24G). Using the jmap command, I was able to check the memory usage nearly doubled. About 47 million data has been saved and the pino-server is dead. Compared to the number of stored data, I think the 24G memory is quite large. What kind of data is stored in the heap memory in the upsert table?
j
This is great analysis! Wondering why do the servers without upsert also have lots of PrimaryKeys on heap
For the server with upsert enabled, it has 97M primary keys on heap
I can see number of strings are roughly double the primary keys, do you also have time column stored as string?
c
@Jackie I used the same schema. The PK Column is registered in the 'primaryKeyColumns'
@Jackie Yes, the time column is a string type.
@Jackie How is memory usage of 97M calculated in upsert mode? Among the top 9 memory usage, which item should I look at? Can you explain to me binary, string, ConcurrentHash Map, Object, RecordLocation, PrimaryKey
j
They are mostly from the
ConcurrentHashMap
which we use to store the upsert metadata: •
byte[]
is the bytes inside the
String
objects • For each entry in the map, the key is a
PrimaryKey
object, and the value is a
RecordLocation
• Within the
PrimaryKey
, it contains an
Object[]
of a single key
String
• Within the
RecordLocation
, it contains a reference to the segment, an int (doc id) and the time value (
String
) This explains the top objects, and why there are roughly 2x
String
and
byte[]
I think it is not fully isolated. The non-upsert server also have some upsert table on top
We should check why does the server have 97M keys. You mentioned the table only contains 47M records, so it should have up to 47M keys if all the records have unique key
c
@Jackie About 97M is right because the pinot-server had 2 upsert tables.(47M + 47M = ). If we change the time column of the upsert table to a Long type, can we store 1 billion data?
j
No, it can only slightly reduce the size of the metadata. Currently ~100M entries take ~24G memory, so each entry is ~240 bytes. Changing the time column to long type might be able to save ~20 bytes per entry, but ~100M entries per server is the limit. Without the enhancement of storing metadata on disk (WIP), you need to add more servers to split the load of the table
c
Hi. @Jackie https://apache-pinot.slack.com/archives/CDRCA57FC/p1654189682334289?thread_ts=1653633166.893689&amp;cid=CDRCA57FC The Pino development team said that there is a plan to change the memory usage method of the UPSERT table. Is the patch for this plan reflected in this Release-0.11.0? I was checking the Release Notes, but couldn't find an exact match for the change. https://docs.pinot.apache.org/basics/releases/0.11.0
j
Hi @coco , the off-heap upsert feature is not included in the 0.11 open source release. We have this feature developed under the Star-Tree release of Apache Pinot. If you are interested, you can try it out with the Star-Tree platform. cc @Mayank
c
@Jackie @Mayank hello. Checking this issue again after a long time. The "Without the enhancement of storing metadata on disk" feature you talked about in the previous conversation was included in "the Star-Tree release of Apache Pinot". Are there any plans to include this feature in an opensource release of Apache Pinot?
j
Unfortunately we don't have plan to open source it in the near future
c
@Jackie It's very unfortunate. But thank you for your support so far.