hi team, one of the partition is not getting cons...
# troubleshooting
s
hi team, one of the partition is not getting consumed for sometime and we are seeing huge lag on that partition (~20million). as of now we have tried these things: - disabling and re-adding the realtime server pod - ran RealtimeSegmentValidationManager on the target table - deleted that segment and retried, but its getting stuck for that particular partition - tried rotating the pods -- tried reset segment as well
we tried these steps but stuck @Mayank
p
Copy code
"span_event_view_1__22__8408__20221031T0715Z": {
      "Server_server-5.server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    },
    "span_event_view_1__22__8409__20221031T0729Z": {
      "Server_server-2.server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    },
    "span_event_view_1__22__8410__20221031T0742Z": {
      "Server_server-3.server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    },
    "span_event_view_1__22__8411__20221031T0755Z": {
      "Server_server-4.server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    },
    "span_event_view_1__23__8000__20221024T0748Z": {
      "Server_server-5.server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    },
No
CONSUMING
segment for partition 22.
RealtimeSegmentValidationManager
didn’t help.
s
Could you try
/resumeConsumption
API? it should create new consuming segments for missing partitions
Copy code
/tables/{tableName}/resumeConsumption
p
What version is it in? I don’t see it on Swagger.
1
s
It should be part of 0.11 release
p
Oh we’re on 0.10 rn.
s
I see.
RealtimeSegmentValidationManager
does not create new consuming segments for cases where consuming segmnts are missing in idealState... If you could move to 0.11, you can try this API. Not sure if there are any other ways to recover from this @Mayank @Neha Pawar?
p
@saurabh dubey We upgraded and used this API. This particular partition is still stuck 😕
s
Do you see a new consuming segment in ideal state? if yes, can you check it's status in EXTERNALVIEW? You can also use
Copy code
/debug/tables/{tableName}
to get segment level errors if any. Basically trying to see if any errors are being encountered during consumption
Also look for logs on server where consuming segment is hosted
p
So what we did was pause consumption, wait for segments to stop consuming, and restart consumption using the API and it started.
The APIs are useful indeed, not sure why Pinot decide to abandon that partition 😭
s
It's possible that particular partition has some message that might be failing to parse leading to some sort of loop failure? Can you check that particular partition has a consmuing segment in IS and its state in EV?
p
That’s what I suspected too but the server had absolutely no logs for that segment even in DEBUG mode.