:wave: hey there I have a strange situation goin...
# troubleshooting
šŸ‘‹ hey there I have a strange situation going on. I have 2 servers setup up. Eventually, they had a problem and restarted and still running fine.I can see in the logs that they are consuming data normally:
Copy code
Consumed 261 events from (rate:3.1030054/s), currentOffset=763096, numRowsConsumedSoFar=288096, numRowsIndexedSoFar=288096
[Consumer clientId=consumer-455, groupId=] Discovered group coordinator <redacted> (id: 2147483646 rack: null)
But the controller still show them with
status and when I try to query the data, I see in the Broker log:
Copy code
No server found for request 1: select responseId from responseCount limit 1
And this is the response from the query API:
Copy code
  "exceptions": [],
  "numServersQueried": 0,
  "numServersResponded": 0,
  "numSegmentsQueried": 0,
  "numSegmentsProcessed": 0,
  "numSegmentsMatched": 0,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 0,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 0,
  "numGroupsLimitReached": false,
  "totalDocs": 0,
  "timeUsedMs": 0,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 0
How can I make the Controller see they are alive? šŸ‘€
Can you check the idealstate and externalview in ZK browser? If server shows up there then may be the broker routing table needs to be rebuild (swagger api).
that one right?
yep, the servers show up there. I tried that one, the response is
Copy code
  "status": "Broker resource is not rebuilt because ideal state is the same for table: responseCount_REALTIME"
does idealstate matches externalview?
Yes, please check if idealstate matches externalview - You can do so manually, or via debug endpoint.
gave me that response, I can see that they are not the same.
shows all partitions on each server (there are 2) should be online or consuming, but
show all the servers as offline.
Diana I did a Harddrive increase exercise today and seems like Iā€™m getting the same issue Iā€™m gonna do some stuff and will share what I did hopefully it helps
hey Diana, we did this and we got 2 new servers in our cluster, it was showing like this, this mostly broke everything for sometime, the way we fixed it was to issue a rebalance and then getting rid of the old instances. We are back to normal levels.
message has been deleted
is this problem resolved? did you find out why the servers were in offline state?
Thanks, @Luis Fernandez I didn't try adding new servers, I wanted to make the controller recognize the current ones as alive (because they were up and consuming). We just deleted the whole deployment to try other stuff out šŸ¤·
I'm having this problem again. Servers are up and running, status is shown as
but when I run a query, the broker says
No server found for request
If I try a rebalance:
"Instance reassigned, table is already balanced"
If I try to run
Copy code
  "status": "Broker resource is not rebuilt because ideal state is the same for table: <redacted>"
But I can see both servers that I have consuming data:
Copy code
Consumed 1410 events from (rate:20.137104/s), currentOffset=642595, numRowsConsumedSoFar=167595, numRowsIndexedSoFar=167595
And the segments are being uploaded to the deepstore. I checked zookeeper and idealstate and externalview and they are the same. Is there a way to force the broker to recognize the servers?
Do ideal state and external view show the servers and brokers for the table?
The externalview for the table only show segments and servers the external view for the broker show both brokers
does this show the servers and segments?
Copy code
curl  -X GET "<broker_url>/debug/routingTable/GithubEventsTier" | jq
and in zk browser, CONFIGS -> PARTICIPANT -> server node, do you see
"shutdownInProgress": "false"
I'm assuming
is the table name, so I should put my table's name, right? I don't see this
url in the Swagger and when I try to fetch it I get a 404. In zk, all servers have
"shutdownInProgress": "false"
I have 3 servers running now, but still the broker says
No server found for request
Its been too long, I'll restart all the pods šŸ˜ž
And again, only after 3 hours running. I won't be able to release to prod šŸ˜ž