We are running pinot in kubernetes, and noticed th...
# troubleshooting
z
We are running pinot in kubernetes, and noticed that the servers are considered ready too early, before the server has managed to start. This causes the statefulset rolling restart to restart multiple servers simultaneously, making segments inaccessible. The server api
/health
endpoint should be used for readiness probing?
m
Broker routes the query to a server for only segments that are online.
z
In our case 7 out of 8 servers were restarting at the same time
m
Are you using replica groups? If so, you could do one replica at a time?
z
we are not using it
and we are doing helm upgrades for config changes, so it's not done manually
m
@Xiang Fu Any suggestions? IIRC, there are deployments that have hooks that wait for sometime (x minutes) before reporting healthy? cc: @Jackie
j
Which version of pinot are you running? How do you shut down the servers? We need to ensure the shut down hook is called when shutting down the servers
z
Running 0.7.1 with the helm chart from the repo. When we do a helm upgrade (i.e. last time I've configured s3 retries for the Servers), the pods are restarted by the StatefulSet controller, using the default RollingUpdate strategy. The controller waits for the restarted pod to be Ready, then proceeds to restart the next one. The standard kubernetes termination is SIGTERM followed by SIGKILL after 30s if not terminated.
In the chart the Brokers have the /health readiness probe, that's why I'm wondering why the Servers don't have it set.
j
Here is a fix for adding the shutdown hook for the server: https://github.com/apache/pinot/pull/7251
Seems it is not included in
0.8.0
, so you need to try either the current master or wait for the next release
Adding @Xiang Fu to take a look as well
z
Looking into the stop method it seems that the shutdown resource check is disabled while the comments suggest it should be enabled by default since the start check is enabled by default:
Copy code
// Shutdown: enable resource check before shutting down the server
    //           Will wait until all the resources in the external view are neither ONLINE nor CONSUMING
    //           No need to enable this check if startup service status check is enabled
    public static final String CONFIG_OF_SHUTDOWN_ENABLE_RESOURCE_CHECK = "pinot.server.shutdown.enableResourceCheck";
    public static final boolean DEFAULT_SHUTDOWN_ENABLE_RESOURCE_CHECK = false;
Oh nevermind, I read that comment wrong.
I'm looking into the docs for the replica groups, and can't find how the servers are assigned to replica groups.
j
z
From that example I can't see how I could control which servers are assigned which replica group
I'd like to run multiple statefulsets of servers, and assign the replica groups to different statefulsets so they can be restarted without disrupting multiple replicas of any segments
Now I see the pool based one is what I'm looking for, thanks!
m
Pool based is more for a giant multitenant cluster with hundreds of tables. Replica group is a simple concept.
z
In this case the assignment to servers is arbitrary or can it be controlled?
Per my current understanding two pinot helm deployments could be run with the same zookeeper and cluster name, and the servers assigned to different pools in each. Then the deployments could be upgraded separately without downtime.
Also the docs mention the recommended upgrade sequence (changing pinot version) of controller, broker, etc, this could make a further split necessary, since I don't think helm guarantees any order of upgrading these statefulsets.