This message was deleted BentoML #ask-for-help

This message was deleted.

Slackbot

01/26/2023, 9:18 PM

This message was deleted.

01/26/2023, 10:55 PM

I think we need to have a better way to “preheat” the endpoint with sample request. Would love to hear @Sean opinion

01/26/2023, 10:56 PM

Btw, love to have your feedback on the triton draft PR: https://github.com/bentoml/BentoML/pull/3471

🙌 2

👀 1

Sean

01/26/2023, 11:04 PM

A good practice is to rely on the health check endpoint to determine if the api and runner servers are ready, and only send requests when the health check returns 200.

Benjamin Tan

01/27/2023, 2:22 AM

is it possible to overwrite the

/ready

with a user-specified one in the Bento service?

Benjamin Tan

01/27/2023, 2:25 AM

Oh yes you can! https://github.com/bentoml/BentoML/issues/2630

Jim Rohrer

01/27/2023, 2:28 AM

Yup, this is exactly what I ended up doing. I needed a custom path for my health check, so in that path I just added a request to the /healthz endpoint. Even with that, it seems to be slow on the first real request to my /classify endpoint though. Would /readyz be better?

Benjamin Tan

01/27/2023, 2:35 AM

how many runners do you have?

Benjamin Tan

01/27/2023, 2:36 AM

i realized that if I have N runners I seem to need N requests to get it all warmed up (makes sense though)

Jim Rohrer

01/27/2023, 3:21 AM

2 of them. I’ll have to test that out. I could try adding a second request to the health check and see if it makes the first real request quicker

Benjamin Tan

01/27/2023, 4:06 AM

awesome! tell me if it works ❤️

Chaoyu

01/27/2023, 5:11 AM

It could also be related to some ML frameworks internal lazy loading behavior. What ML framework are you using with BentoML?

Jim Rohrer

01/27/2023, 3:30 PM

Right now I'm using Transformers, just pre-trained models from HuggingFace

Jim Rohrer

01/27/2023, 3:30 PM

I created a couple of custom transformers pipelines (to do some extra pre- and post-processing) around their BEiT and CLIP models

Chaoyu

01/27/2023, 6:01 PM

Could you share related service definition code and model saving code?

Jim Rohrer

01/27/2023, 6:36 PM

Sure. I sanitized a few proprietary things, but here's a Gist with the 2 scripts to save the Bento models and the service.py file: https://gist.github.com/akuma12/b4443a0103b2dc8b661b1bdb6d61e6ee

Jim Rohrer

01/27/2023, 6:57 PM

Semi-unrelated question, but is it possible to run multiple runner processes without using Yatai? Like run 2 copies of each of my runners. Not sure if it would be a performance gain or not, but I have plenty of GPU memory to spare

Benjamin Tan

01/28/2023, 1:08 PM

If you deploy a

BentoDeployment

, you can specify any number of runners you want

Benjamin Tan

01/28/2023, 1:08 PM

i.e. write a

BentoDeployment

yaml and

kubectl create -f mybentodeployment.yaml

Chaoyu

01/29/2023, 3:10 AM

yes it is possible to configure the number of replicas for each runner within a single container (no kubernetes or yatai)

Chaoyu

01/29/2023, 3:11 AM

https://docs.bentoml.org/en/latest/concepts/runner.html#runner-configuration

Jim Rohrer

01/30/2023, 3:55 PM

I don't suppose you can elaborate on how to do that? I see the

autoscaling

section of the Yatai BentoDeployment, but that doesn't seem to work inside of the bentoml_configuration.yaml file.

Jim Rohrer

01/30/2023, 3:59 PM

I should mention that the instances I'm running on have a single GPU.

2 Views

Open in Slack

Previous Next