<https://apache-pinot.slack.com/archives/CDRCA57FC...
# troubleshooting
n
Sorry. Didn’t know this channel existed
It is in the ideal state:
Copy code
"mapFields": {
    "table_OFFLINE_1603929600370_1603929899494_33": {
      "Server_10.136.245.18_8003": "ONLINE"
    },
Just doesn’t seem helix is doing anything to reach that state?
m
Do you see any messages pending for instance
Or anything in the logs
n
Where would you see pending messages?
Nothing particularly interesting in server, broker, or controller logs
m
You can find messages/errors/etc in ZK under
INSTANCES
I recall a helix bug fix http://helix.apache.org/0.9.8-docs/releasenotes/release-0.9.8.html that we pulled in PR-6166
s
Can you check your server instance state in helix? It should be in zookeeper under
INSTANCES/Server_10.136.245.18_8003
Under this folder, the CURRENTSTATES folder should have a sessionid dir and a table dir under neath that. If this is not available, then at the same level you can look for ERRORS and see what you find there. If all these are greyed out, then the server 10.136.245.18 is not up, You can check under LIVEINSTANCES as to which servers are up
n
Can see the server but it has 0/83 segments. In zookeeper everything is empty except history for that server.
Copy code
{
  "id": "Server_10.136.245.13_8003",
  "simpleFields": {
    "LAST_OFFLINE_TIME": "-1"
  },
  "mapFields": {},
  "listFields": {
    "HISTORY": [
      "{DATE=2020-11-08T17:07:16:698, VERSION=0.9.8, SESSION=1006cf417e10022, TIME=1604855236698}",
      "{DATE=2020-11-08T17:16:43:638, VERSION=0.9.8, SESSION=1006cf417e1002b, TIME=1604855803638}",
      "{DATE=2020-11-09T00:38:17:866, VERSION=0.9.8, SESSION=1006cf417e10034, TIME=1604882297866}",
      "{DATE=2020-11-09T00:41:42:746, VERSION=0.9.8, SESSION=1006cf417e10035, TIME=1604882502746}"
    ],
    "OFFLINE": [
      "2020-11-08T17:07:17:345",
      "2020-11-09T00:38:16:619",
      "2020-11-09T00:38:17:969"
    ]
  }
}
So, trying to drop the dead servers, and even after disabling it it says
Failed to drop instance Server_10.136.245.18_8003 - Instance Server_10.136.245.18_8003 exists in ideal state for <table>_OFFLINE
So a server that is disabled and dead exists in the ideal state for my table, which is pretty bizarre.
s
The table was created before the server went disabled and dead, perhaps. You can try running a rebalance on the table? You will still need to untag this server and re-tag your new server with the right tenant name
x
I think it’s unable for pinot to know that if one server is just offline or it’s dead and should be removed