:wave: hey there I have a strange situation goin...
# troubleshooting
d
šŸ‘‹ hey there I have a strange situation going on. I have 2 servers setup up. Eventually, they had a problem and restarted and still running fine.I can see in the logs that they are consuming data normally:
Copy code
Consumed 261 events from (rate:3.1030054/s), currentOffset=763096, numRowsConsumedSoFar=288096, numRowsIndexedSoFar=288096
....
[Consumer clientId=consumer-455, groupId=] Discovered group coordinator <redacted> (id: 2147483646 rack: null)
But the controller still show them with
dead
status and when I try to query the data, I see in the Broker log:
Copy code
No server found for request 1: select responseId from responseCount limit 1
And this is the response from the query API:
Copy code
{
  "exceptions": [],
  "numServersQueried": 0,
  "numServersResponded": 0,
  "numSegmentsQueried": 0,
  "numSegmentsProcessed": 0,
  "numSegmentsMatched": 0,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 0,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 0,
  "numGroupsLimitReached": false,
  "totalDocs": 0,
  "timeUsedMs": 0,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 0
}
How can I make the Controller see they are alive? šŸ‘€
m
Can you check the idealstate and externalview in ZK browser? If server shows up there then may be the broker routing table needs to be rebuild (swagger api).
l
/tables/{tableName}/rebuildBrokerResourceFromHelixTags
that one right?
d
yep, the servers show up there. I tried that one, the response is
Copy code
{
  "status": "Broker resource is not rebuilt because ideal state is the same for table: responseCount_REALTIME"
}
r
does idealstate matches externalview?
m
Yes, please check if idealstate matches externalview - You can do so manually, or via debug endpoint.
d
Although
/tables/{tableName}/rebuildBrokerResourceFromHelixTags
gave me that response, I can see that they are not the same.
/tables/<tableName>/idealstate
shows all partitions on each server (there are 2) should be online or consuming, but
/tables/<tableName>/externalview
show all the servers as offline.
l
Diana I did a Harddrive increase exercise today and seems like Iā€™m getting the same issue Iā€™m gonna do some stuff and will share what I did hopefully it helps
hey Diana, we did this and we got 2 new servers in our cluster, it was showing like this, this mostly broke everything for sometime, the way we fixed it was to issue a rebalance and then getting rid of the old instances. We are back to normal levels.
message has been deleted
r
is this problem resolved? did you find out why the servers were in offline state?
d
Thanks, @Luis Fernandez I didn't try adding new servers, I wanted to make the controller recognize the current ones as alive (because they were up and consuming). We just deleted the whole deployment to try other stuff out šŸ¤·
I'm having this problem again. Servers are up and running, status is shown as
Alive
but when I run a query, the broker says
No server found for request
If I try a rebalance:
"Instance reassigned, table is already balanced"
If I try to run
/rebuildBrokerResourceFromHelixTags
:
Copy code
{
  "status": "Broker resource is not rebuilt because ideal state is the same for table: <redacted>"
}
But I can see both servers that I have consuming data:
Copy code
Consumed 1410 events from (rate:20.137104/s), currentOffset=642595, numRowsConsumedSoFar=167595, numRowsIndexedSoFar=167595
And the segments are being uploaded to the deepstore. I checked zookeeper and idealstate and externalview and they are the same. Is there a way to force the broker to recognize the servers?
m
Do ideal state and external view show the servers and brokers for the table?
d
The externalview for the table only show segments and servers the external view for the broker show both brokers
n
does this show the servers and segments?
Copy code
curl  -X GET "<broker_url>/debug/routingTable/GithubEventsTier" | jq
and in zk browser, CONFIGS -> PARTICIPANT -> server node, do you see
"shutdownInProgress": "false"
?
d
I'm assuming
GithubEventsTier
is the table name, so I should put my table's name, right? I don't see this
<broker>/debug/rountingTable
url in the Swagger and when I try to fetch it I get a 404. In zk, all servers have
"shutdownInProgress": "false"
I have 3 servers running now, but still the broker says
No server found for request
Its been too long, I'll restart all the pods šŸ˜ž
And again, only after 3 hours running. I won't be able to release to prod šŸ˜ž