https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
d

Diana Arnos

04/05/2022, 7:31 AM
Hello 👋 I have 3 of my 4 servers stuck with this kind of message:
Copy code
Find unloaded segment: <tableName>__0__35__20220404T0749Z, table: <tableName>_REALTIME, expected: ONLINE, actual: CONSUMING
Sleeping 1 second waiting for all segments loaded for partial-upsert table: <tableName>_REALTIME
Which endpoint should I use to try to sort this out? The
reload
one does not work, for the segment is still consuming and the
reset
always fails, for it can't stop a consuming segment for some reason. Would it be okay to just delete this segment? Would the Controller know it needs to be consumed again?
n

Neha Pawar

04/05/2022, 8:38 PM
deleting the consumig segment won’t help. it’ll get stuck.
@Jackie ^ any idea about this error?
j

Jackie

04/05/2022, 9:27 PM
@Diana Arnos Have you tried restarting the servers?
Not sure how you run into this scenario. Somehow the consuming segment is already committed, but the failed servers haven't started consumption yet
d

Diana Arnos

04/06/2022, 2:22 PM
@Jackie yes, this started happening after a restart. And by restart I mean: we are deploying on k8s through the helm chart and, after changing the resource config (bumping up RAM), the servers' pods were recreated
j

Jackie

04/06/2022, 5:46 PM
@Diana Arnos Can you please check the ideal state and the external view of this segment? When server is restarted, the segment should directly go to ONLINE state instead of CONSUMING state
d

Diana Arnos

04/07/2022, 12:20 PM
Not anymore... we removed everything and we have a fresh new deployment. Everything is working fine for now. But what would happen if the segment didn't get committed before we restarted the servers? Would it still appear as
ONLINE
, even if not "complete"?
So, I'm having a similar problem again:
Copy code
Find ERROR segment: <tableName>__100__45__20220409T0856Z, table: <tableName>_REALTIME, expected: ONLINE                                                                                                                                                 
Sleeping 1 second waiting for all segments loaded for partial-upsert table: <tableName>_REALTIME
When I try to reload the segment, I see this message in the logs:
Copy code
Reloading single segment: <tableName>__100__45__20220409T0856Z in table: <tableName>_REALTIME                                                                                                                                                           
Segment metadata is null. Skip reloading segment: <tableName>t__100__45__20220409T0856Z in table: <tableName>_REALTIME