I have a table showing BAD because a handful of se...
# troubleshooting
t
I have a table showing BAD because a handful of segments are only on one of two servers. I am trying to "rebalance servers" to fix and I see
"status": "IN_PROGRESS"
but nothing in the controller logs other than
Copy code
INFO [CustomRebalancer] [HelixController-pipeline-default-pinot-(3cd60663_DEFAULT)] Computing BestPossibleMapping for node_reboot_events_REALTIME
and
Copy code
WARN [SegmentStatusChecker] [pool-10-thread-4] Table node_reboot_events_REALTIME has 1 replicas, below replication threshold :2
What status should I expect to see?
m
Do you know why the segments were BAD? If it was due to a server going down, it might be better to bring the server back up?
t
The servers are all up. I am not sure how it happened. I know which segments, it is 7 of ~600
m
Can you check external view in ZK (to eliminate UI bug)?
For such scenario, rebalance is not the way to solve it.
t
Checking ZK now. But for rebalance status, how can I tell if anything is happening?
m
Not at the moment, we need to add it (looking for volunteers).
t
Oh well. For status, where should I look in ZK? pinot/INSTANCES/<server>/CURRENTSTATE/xxx/<tablename> does not show the segment
And what is the way to solve this scenario?
(and thanks for your help)
ZK shows the segment as error on the server that shows an error in the UI
Copy code
"node_reboot_events__1__631__20220124T1158Z": {
      "CURRENT_STATE": "ERROR"
    },
I think the right way to fix a BAD swement like this is via
/segments/${TABLE}_REALTIME/$SEGMENT/reset