I have a table showing BAD because a handful of segments are Apache Pinot #troubleshooting

I have a table showing BAD because a handful of se...

Tony Requist

03/10/2022, 11:55 PM

I have a table showing BAD because a handful of segments are only on one of two servers. I am trying to "rebalance servers" to fix and I see

"status": "IN_PROGRESS"

but nothing in the controller logs other than

Copy code

INFO [CustomRebalancer] [HelixController-pipeline-default-pinot-(3cd60663_DEFAULT)] Computing BestPossibleMapping for node_reboot_events_REALTIME

and

Copy code

WARN [SegmentStatusChecker] [pool-10-thread-4] Table node_reboot_events_REALTIME has 1 replicas, below replication threshold :2

What status should I expect to see?

Mayank

03/10/2022, 11:56 PM

Do you know why the segments were BAD? If it was due to a server going down, it might be better to bring the server back up?

Tony Requist

03/11/2022, 12:02 AM

The servers are all up. I am not sure how it happened. I know which segments, it is 7 of ~600

Mayank

03/11/2022, 12:03 AM

Can you check external view in ZK (to eliminate UI bug)?

Mayank

03/11/2022, 12:03 AM

For such scenario, rebalance is not the way to solve it.

Tony Requist

03/11/2022, 12:04 AM

Checking ZK now. But for rebalance status, how can I tell if anything is happening?

Mayank

03/11/2022, 12:06 AM

Not at the moment, we need to add it (looking for volunteers).

Tony Requist

03/11/2022, 12:10 AM

Oh well. For status, where should I look in ZK? pinot/INSTANCES/<server>/CURRENTSTATE/xxx/<tablename> does not show the segment

Tony Requist

03/11/2022, 12:10 AM

And what is the way to solve this scenario?

Tony Requist

03/11/2022, 12:10 AM

(and thanks for your help)

Tony Requist

03/11/2022, 12:20 AM

ZK shows the segment as error on the server that shows an error in the UI

Copy code

"node_reboot_events__1__631__20220124T1158Z": {
      "CURRENT_STATE": "ERROR"
    },

Tony Requist

03/11/2022, 4:02 AM

I think the right way to fix a BAD swement like this is via

/segments/${TABLE}_REALTIME/$SEGMENT/reset