when i downscale the number of instances in my k8s...
# troubleshooting
x
when i downscale the number of instances in my k8s helm chart for pinot components, i have instances that are not removed?
Copy code
# healthy component
{
  "id": "Controller_pinot-controller-0.pinot-controller-headless.pinot.svc.cluster.local_9000",
  "simpleFields": {
    "HELIX_VERSION": "0.9.8",
    "LIVE_INSTANCE": "1@pinot-controller-0",
    "SESSION_ID": "100045e48f1002d"
  },
  "mapFields": {},
  "listFields": {}
}

# removed component
{
  "code": 404,
  "error": "ZKPath /pinot/LIVEINSTANCES/Controller_pinot-controller-1.pinot-controller-headless.pinot.svc.cluster.local_9000 does not exist:"
}
am i supposed to manually use the REST api to drop instances? for what it’s worth, i update my helm chart deployment via terraform
d
Yes
Scaling down involves segment rebalance
x
i wasn’t able to find this info in the docs or the pinot github issues. would you know where i might find more information?
d
There’s a bit of orchestration involved since Pinot doesn’t know if the servers will come back or not.
x
hm but its not just servers, controllers and brokers also have to be manually dropped?
d
Yes, Pinot is a stateful system, everyone has an identity and a role. the pinot doc is generic and not precise about the specifics of helm based deployment.
it can certainly be improved
x
thanks for the answers!
i was hoping to use autoscaling for my pinot deployment
are there any resources on production deployments of pinot where i can read about how they manage this?
d
Usually, you start with resizing your statefulsets, then trigger rebalance.
scaling or scaling down requires a rebalance trigger on pinot apis.
Having dead reference is not the end of the world if you plan to scale back up later.
x
the rebalance operations put the segments back on my dead servers
and i can’t drop the dead servers because
Copy code
{
  "code": 409,
  "error": "Failed to drop instance Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098 - Instance Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098 exists in ideal state for events-10-partitions_OFFLINE"
}
am i missing something here?
d
Sounds like cleaning dead references from the ideal state is a prerequisite to rebalance.
x
its not obvious to me which api i should use to clean dead references
so i saw is that IdealState is a helix concept, and i can probably update the zookeeper znode that stores the idealstate to remove the dead servers. is this the recommended way?
alright so what it took was for me to 1. update (remove) the tags of the dead servers/brokers so that they are no longer tagged to a tenant 2. rebalance the segments thanks @Daniel Lavoie!
@Daniel Lavoie would it be useful if i updated the doc here? https://docs.pinot.apache.org/basics/getting-started/frequent-questions/pinot-on-kubernetes-faq or do you think that the notes on capacity changes here: https://docs.pinot.apache.org/operators/operating-pinot/rebalance/rebalance-brokers#capacity-changes would be sufficient?
k
@xtrntr to me it seems like latter would be a good place to add this information but will let @Daniel Lavoie also chime in. Thanks for helping with documentation update.