This message was deleted.
# ask-for-help
s
This message was deleted.
👀 1
m
I’m sorry. I had the repository private. I just made it public
l
@sauyon maybe take a look?
s
I haven't looked very deeply, but something that immediately comes to mind is that
max_latency_ms
is specifically only for each individual runner, so the API server can still take arbitrarily long to handle requests.
Hm, ok, either way, this looks like a bug. Investigating it now.
gratitude thank you 1
My working theory at the moment for your question in case 2 is that because there is no backpressure release and we're using an async runtime (which in Python is not fair), some request handlers are getting caught in async scheduler limbo until nearly the end of the benchmark, at which point they are scheduled and return.
m
Hi @sauyon we found a few more things that might be useful. I’m focusing in the Case 2 (max_latency_ms=infinite) and I’m trying to obtain a reasonable milliseconds number of the
compute
part (which should consider the computation in the Runner + the time in the queue of the Runner). We have found that: • In the
asyncio.Semaphore(K)
if K is set to 30 or more we get the weird behaviour in the
compute
(items waiting as much as the whole benchmark). But if it is below 30 (we tried 10 and 20) it works fine (and still serves around 30RPS). Also note that ~30 is the RPS that the machine can handle. • When running some tests with locust.io we saw that if we spawn 100 users to perform requests and we spawn them alltogether suddenly this weird situation happens. However if the users are spawned 1/second it still handles them quite fine and the
compute
numbers are fine (with just a few oultliers) . • Not important: Performing a benchmark with a similar script to the python one but written in JavaScript gives the same weird results with K=100. So maybe it is just a matter of the benchmark because I was doing too many requests (K=100) all of a sudden? Not sure if this could happen in a real situation.
s
Yeah, I did try tracing our code through this but it's obviously a little hard to fully trace down---I believe it's just a quirk in the way that the Python async runtime handles heavy async scheduler thread load, and it's only really a problem when the server is at 100% load. It's something that we might want to fix eventually, but I suspect that it'll be a long time before we're really able to do anything about it, as it might just be very involved. Something I'm going to look into briefly is different async runtimes, maybe that will help.
👍 1
m
Thank you!