Folks, Have some doubts related to rebalance. Can ...
# troubleshooting
l
Folks, Have some doubts related to rebalance. Can someone please clarify these or point me to relevant documentation. ============= • One of our pinot servers was scaled down (kubenetes - from 4 servers to 3 servers). • Even after several hours segments didn’t come online. • Same case with CONSUMING segments. Kafka partitions which were getting processed by scaled down server are now not getting processed at all. ============= • When does the rebalance gets triggered? I already tried server/controller restarts. I also tried rebalance from controller UI • What is the right way to scaled down a server? FYI: we are on 0.3.0 + some fixes, in case if it matters
n
I also tried rebalance from controller UI
- this should have done it. Controller/server restarts won’t help
• what was the output of the rebalance command? • After the rebalance, did you check the Ideal state of the table? The old server should no longer be used
for consuming segments to get rebalanced, you have to set flag includeConsuming to true in the rebalance command
oh, and you need to untag the server. that’s the first step
l
How do I untag a server?
what was the output of the rebalance command?
Status Code: 200
Copy code
{
  "status": "NO_OP",
  "description": "Table is already balanced",
…..
}
n
in controller APIs, under INSTANCES
let me pull up exact steps
l
These are the ops I see.
n
there should be an UPATE too?
l
Nope. I don’t see that. btw, I’m on 0.4.0-SNAPSHOT.
Delete instance response =================
Copy code
{
  "code": 409,
  "error": "Failed to drop instance Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098 - Instance Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098 exists in ideal state for domainEventView_REALTIME"
}
n
we dont want to delete
if there’s no update, you can look at that instance in zooinspector and change the tag
you should see the tag in this path:
you can remove “DefaultTenant_OFFLINE” and put something else, like “server_untagged”
l
ok. thanks Neha for quick response. Will try this and update the thread in sometime.
n
remove both tags “DefaultTenant_OFFLINE” and “DefaultTenant_REALTIME” (or whatever it is you are using as tag name). Then rebalance with includeConsuming. The output of rebalance should not be NOOP. YOu should see rebalanced state.
lmk how it goes
l
@Neha Pawar: • As suggested, removed tags in zk server. • Triggered rebalance with CONSUMING. No effect. • Triggered rebalance with 3 flags enabled - includeConsuming, reaasignInstances, downtime • Reassignment happened but Segments moved to ERROR state.
With several combinations of rebalance, I’m still not able to bring the segments ONLINE
Request:
Copy code
<http://localhost:9000/tables/spanEventView/rebalance?type=REALTIME&dryRun=false&reassignInstances=true&includeConsuming=true&bootstrap=true&downtime=true&minAvailableReplicas=1&bestEfforts=true>
Response snippet:
Copy code
{
  "status": "DONE",
  "description": "Success with downtime (replaced IdealState with the target segment assignment, ExternalView might not reach the target segment assignment yet)",
  "instanceAssignment": {
    "CONSUMING": {
      "instancePartitionsName": "spanEventView_CONSUMING",
      "partitionToInstancesMap": {
        "0_0": [
          "Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-0.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098"
        ]
      }
....
....
Finally, after some trial and error following rebalance request (after removing the tags as suggested) worked.
Copy code
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' '<http://localhost:9000/tables/rawTraceView/rebalance?type=REALTIME&dryRun=false&reassignInstances=true&includeConsuming=true&bootstrap=true&downtime=true&minAvailableReplicas=1&bestEfforts=false>'
👍 1