We scaled up our brokers and saw that for about 15...
# troubleshooting
e
We scaled up our brokers and saw that for about 15 minutes some brokers were returning "RESOURCE_MISSING_ERROR" - when I looked in the zk browser for INSTANCES/broker-<x>/CURRENTSTATES it appears like the END_TIME values spanned that 15 minute interval. To avoid this should users get their client by contacting zookeeper, i.e. brokers for table?
Or should this not have happened?
Currently users just set the pinot broker hostname to the kubernetes service, and there is only 1 broker tenant.
m
Typically, there should be a load-balancer/vip in front of the brokers that should listen to external view to ensure that brokers are ONLINE before it can route requests.
But 15 minutes seems like a long time
e
But if the broker is up but not yet ready to serve requests how do we tell? We just use the readiness probe from the k8s chart.
m
Broker is ready when it is ONLINE on external view.
e
Thanks - so maybe we are getting our clients incorrectly then? i.e. just using the k8s service (which only adds endpoints when they are online). Judging from the messages in zk browser:INSTANCES/<broker>/CURRENT_STATES the earliest table online occurred 17 mins before the last one