<!here> We recently added 2 server replicas in our...
# troubleshooting
a
<!here> We recently added 2 server replicas in our pinot cluster on k8s. The table realTimeconfig also has 3 replicas configured.So each segment is present on every pod.After that I made changes to the schema and set the reload segments flag to true.I noticed that the segments of all pods in k8s happens at the same time due to which application was down for 1 hour.We have 652 segments with 1 day flush time.Total records 7143718 with skipUpsert = true.I do know rebalance segment has a feature of Same problem occurs with server pod restarts from argo.Is there a way to do the segment reload in an uptime fashion.(I do know that rebalance has a flag minAvailable replicas) Does reload have that feature
m
I didn’t get your setup, what’s the total number of servers?
a
We have total 3 server pods now
m
The reload in itself will atomically swap the segments, so that shouldn’t cause any downtime. What change did you do in the schema?
a
i added 2 fields with default values
m
Are they derived fields?
a
No new fields but with default values
m
Hmm, I was asking because I suspected that creating those new columns/indexes might take some time and put pressure on your system.
Essentially, what I am saying is that it is not the reload itself, but the computation it may kick in that would have created pressure on your system. A workaround would be to simply do a rolling restart of servers.
a
For rolling restart is there an endpoint that can get us a status of the segment load on a pod ?
m
Copy code
@GET
  @Path("/health/readiness")
  @Produces(MediaType.TEXT_PLAIN)
  @ApiOperation(value = "Checking server readiness status")
  @ApiResponses(value = {
      @ApiResponse(code = 200, message = "Server is ready to serve queries"),
      @ApiResponse(code = 503, message = "Server is not ready to serve queries")
  })
a
Thanks this is 0.9.0 or a new endpoint as I don't see in swagger
m
This is on the server. Are you looking at server swagger?
👍 2
a
No controller
m
But seems like it was added back in July, so 0.9.0 might not have it
a
Ok
m
You can check to ensure