has anyone ever encountered random 503s in your pi...
# troubleshooting
l
has anyone ever encountered random 503s in your pinot setup? we get them every once in a while just trying to understand and debug what may be happening since the dashboards are not telling me anything out of the extraordinary, we get this 503s sometimes when we hit the broker, all this setup is kubernetes so we have a loadbalancer on top of it, and have 2 brokers running, any ideas?
m
Any logs in broker/server for the request?
j
subscribing because i’ve seen this as well, but also haven’t dug into the logs since it’s 10s of 10ks requests
l
hard to find them since it’s so low @Mayank i don
m
There’s a metric for ’schedularWaitTime` (time for which query was sitting queue). Check if that is spiking
l
i get the error on the client so i have those logs i know how the request looked like, as in the sql, but hard to correlate it to the logs
hm let me check that
m
before logs, you can check for metric above, along with cpu/mem/latency metrics
l
that’s a server metric?
this metric is in ms yes?
image.png
m
Yes ms. Is this p99? If So seems like this this is fine, may be broker needs to be scaled
(Based on the limited info I have).
l
that’s the mean rate p99 is just 1
i don’t see much happening on the cpu/mem from in the server
latency looks okay too
this has been the cadence of the 503 errors this week
image.png
server usage
broker usage
broker latency
p99 ^
i’m using the dashboard that is provided on the pinot docs, something i should tweak?
m
All this seems healthy to me. Could there be a load balancer issue?
l
update: I do see 503s in the load balancer but what could be the reason the load balancer may be 503ing?
m
What’s the overall qps and total number of brokers?
l
2 brokers 100qps
m
Hmm, that doesn’t seem too much for 2 brokers. Not sure.