https://pinot.apache.org/ logo
l

Laxman Ch

08/19/2020, 6:57 PM
Folks, Have some doubts related to rebalance. Can someone please clarify these or point me to relevant documentation. ============= • One of our pinot servers was scaled down (kubenetes - from 4 servers to 3 servers). • Even after several hours segments didn’t come online. • Same case with CONSUMING segments. Kafka partitions which were getting processed by scaled down server are now not getting processed at all. ============= • When does the rebalance gets triggered? I already tried server/controller restarts. I also tried rebalance from controller UI • What is the right way to scaled down a server? FYI: we are on 0.3.0 + some fixes, in case if it matters
n

Neha Pawar

08/19/2020, 7:01 PM
I also tried rebalance from controller UI
- this should have done it. Controller/server restarts won’t help
• what was the output of the rebalance command? • After the rebalance, did you check the Ideal state of the table? The old server should no longer be used
for consuming segments to get rebalanced, you have to set flag includeConsuming to true in the rebalance command
oh, and you need to untag the server. that’s the first step
l

Laxman Ch

08/19/2020, 7:07 PM
How do I untag a server?
what was the output of the rebalance command?
Status Code: 200
Copy code
{
  "status": "NO_OP",
  "description": "Table is already balanced",
…..
}
n

Neha Pawar

08/19/2020, 7:09 PM
in controller APIs, under INSTANCES
let me pull up exact steps
l

Laxman Ch

08/19/2020, 7:11 PM
These are the ops I see.
n

Neha Pawar

08/19/2020, 7:12 PM
there should be an UPATE too?
l

Laxman Ch

08/19/2020, 7:13 PM
Nope. I don’t see that. btw, I’m on 0.4.0-SNAPSHOT.
Delete instance response =================
Copy code
{
  "code": 409,
  "error": "Failed to drop instance Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098 - Instance Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098 exists in ideal state for domainEventView_REALTIME"
}
n

Neha Pawar

08/19/2020, 7:14 PM
we dont want to delete
if there’s no update, you can look at that instance in zooinspector and change the tag
you should see the tag in this path:
you can remove “DefaultTenant_OFFLINE” and put something else, like “server_untagged”
l

Laxman Ch

08/19/2020, 7:20 PM
ok. thanks Neha for quick response. Will try this and update the thread in sometime.
n

Neha Pawar

08/19/2020, 7:21 PM
remove both tags “DefaultTenant_OFFLINE” and “DefaultTenant_REALTIME” (or whatever it is you are using as tag name). Then rebalance with includeConsuming. The output of rebalance should not be NOOP. YOu should see rebalanced state.
lmk how it goes
l

Laxman Ch

08/20/2020, 4:03 AM
@Neha Pawar: • As suggested, removed tags in zk server. • Triggered rebalance with CONSUMING. No effect. • Triggered rebalance with 3 flags enabled - includeConsuming, reaasignInstances, downtime • Reassignment happened but Segments moved to ERROR state.
With several combinations of rebalance, I’m still not able to bring the segments ONLINE
Request:
Copy code
<http://localhost:9000/tables/spanEventView/rebalance?type=REALTIME&dryRun=false&reassignInstances=true&includeConsuming=true&bootstrap=true&downtime=true&minAvailableReplicas=1&bestEfforts=true>
Response snippet:
Copy code
{
  "status": "DONE",
  "description": "Success with downtime (replaced IdealState with the target segment assignment, ExternalView might not reach the target segment assignment yet)",
  "instanceAssignment": {
    "CONSUMING": {
      "instancePartitionsName": "spanEventView_CONSUMING",
      "partitionToInstancesMap": {
        "0_0": [
          "Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-0.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098",
          "Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098"
        ]
      }
....
....
Finally, after some trial and error following rebalance request (after removing the tags as suggested) worked.
Copy code
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' '<http://localhost:9000/tables/rawTraceView/rebalance?type=REALTIME&dryRun=false&reassignInstances=true&includeConsuming=true&bootstrap=true&downtime=true&minAvailableReplicas=1&bestEfforts=false>'
👍 1