Every now and then, after several restarts of brok...
# general
n
Every now and then, after several restarts of brokers, controllers, servers, my local Pinot gets into a bad state where it shows number of segments 0/83. VerifySegmentState gives
Copy code
Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} is MISSING in external view:
Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} does NOT match external view: null
table_OFFLINE = ERROR
Helix doesn’t seem to be assigning segments to my server. So the whole thing is just broken. Scorched earth (nuking ZK, reloading all the segments, etc) works, but if this were procuction how do I get things to work again? How do you debug something like this?
I think maybe it has something to do with rebooting a server with the same name:
Copy code
2020/11/08 11:16:08.472 WARN [ParticipantManager] [main] found another instance with same instanceName: Server_10.136.245.13_8003 in cluster quickstart
Looking at a particular segment
It should be able to recover given all of the segments are in deep store.
x
is your server up address changing ? Sometimes it may cause the Pinot start with different instance name
n
It’s just running in intellij after some restarts
I think the IP is changing occasionally, though.
I imagine a similar thing would happen in k8s