I’m seeing an issue on helix when running integrat...
# pinot-dev
x
I’m seeing an issue on helix when running integration test on
JDK 11
. After controller disconnected from helix, seems that the this Time task is not stopped as expected and it hangs the test forever. It might not be an issue for prod as this behavior doesn’t happen on normal workflow.
Copy code
17:37:59.009 [Timer-22] ERROR org.apache.helix.controller.GenericHelixController - Time task failed. Rebalance task type: PeriodicalRebalance, cluster: PinotBrokerRestletResourceStatelessTest
org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect()
	at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.controller.GenericHelixController$RebalanceTask.run(GenericHelixController.java:247) [helix-core-0.9.8.jar:0.9.8]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
17:37:59.176 [Timer-83] ERROR org.apache.helix.controller.GenericHelixController - Time task failed. Rebalance task type: PeriodicalRebalance, cluster: PinotControllerModeStatelessTest
org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect()
	at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.controller.GenericHelixController$RebalanceTask.run(GenericHelixController.java:247) [helix-core-0.9.8.jar:0.9.8]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
17:37:59.752 [Timer-125] ERROR org.apache.helix.controller.GenericHelixController - Time task failed. Rebalance task type: PeriodicalRebalance, cluster: PinotHelixResourceManagerStatelessTest
org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect()
	at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[helix-core-0.9.8.jar:0.9.8]
	at org.apache.helix.controller.GenericHelixController$RebalanceTask.run(GenericHelixController.java:247) [helix-core-0.9.8.jar:0.9.8]
	at java.util.TimerThread.mainLoop(Timer.java:556) [?:?]
	at java.util.TimerThread.run(Timer.java:506) [?:?]
As a workaround, I’m trying to put controller stop into a timeout task and make all zk and controller tests using dynamic port.
j
I feel the issue is we don't disconnect properly?
Why is helix trying to reconnect if we explicitly ask it to disconnect?
x
it’s the timetask inside the helix
we called helix.disconnect in controller.stop()
the first thing in helix.disconnect() is to cancel the timetasks, but somehow that doesn’t happen
j
I see. Can we conclude that the Helix version we are using is not
java 11
ready?
The key note is
Helix is now on Java 8!
🤦‍♂️
🤦‍♂️ 1
We are way ahead
x
🤣
we are on 0.9.8
j
What? They published the
0.9.0
release note on 2021-02-01
x
I think that’s when they updated the website