This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

05/20/2023, 11:07 AM

This message was deleted.

👀 1

Mikel Menta

05/20/2023, 11:08 AM

This is related to my previous question: https://bentoml.slack.com/archives/CKRANBHPH/p1684335910835759

Mikel Menta

05/22/2023, 8:47 AM

I’m sorry. I had the repository private. I just made it public

larme (shenyang)

05/23/2023, 3:19 AM

@sauyon maybe take a look?

sauyon

05/23/2023, 3:21 AM

I haven't looked very deeply, but something that immediately comes to mind is that

max_latency_ms

is specifically only for each individual runner, so the API server can still take arbitrarily long to handle requests.

sauyon

05/23/2023, 3:30 AM

Hm, ok, either way, this looks like a bug. Investigating it now.

gratitude thank you 1

sauyon

05/23/2023, 3:33 AM

My working theory at the moment for your question in case 2 is that because there is no backpressure release and we're using an async runtime (which in Python is not fair), some request handlers are getting caught in async scheduler limbo until nearly the end of the benchmark, at which point they are scheduled and return.

Mikel Menta

05/24/2023, 10:05 AM

Hi @sauyon we found a few more things that might be useful. I’m focusing in the Case 2 (max_latency_ms=infinite) and I’m trying to obtain a reasonable milliseconds number of the

compute

part (which should consider the computation in the Runner + the time in the queue of the Runner). We have found that: • In the

asyncio.Semaphore(K)

if K is set to 30 or more we get the weird behaviour in the

compute

(items waiting as much as the whole benchmark). But if it is below 30 (we tried 10 and 20) it works fine (and still serves around 30RPS). Also note that ~30 is the RPS that the machine can handle. • When running some tests with locust.io we saw that if we spawn 100 users to perform requests and we spawn them alltogether suddenly this weird situation happens. However if the users are spawned 1/second it still handles them quite fine and the

compute

numbers are fine (with just a few oultliers) . • Not important: Performing a benchmark with a similar script to the python one but written in JavaScript gives the same weird results with K=100. So maybe it is just a matter of the benchmark because I was doing too many requests (K=100) all of a sudden? Not sure if this could happen in a real situation.

sauyon

05/24/2023, 9:09 PM

Yeah, I did try tracing our code through this but it's obviously a little hard to fully trace down---I believe it's just a quirk in the way that the Python async runtime handles heavy async scheduler thread load, and it's only really a problem when the server is at 100% load. It's something that we might want to fix eventually, but I suspect that it'll be a long time before we're really able to do anything about it, as it might just be very involved. Something I'm going to look into briefly is different async runtimes, maybe that will help.

👍 1

Mikel Menta

05/25/2023, 6:46 AM

Thank you!

Open in Slack

Previous Next