Hi! If your model is stateful I assume you need to have only one instance of it at one time; by default BentoML will spawn multiple for scaling.
sauyon
02/16/2023, 9:34 PM
You might want to try setting a CPU limit of 1 as a workaround in the BentoML runner configuration:
Copy code
runners:
runner_name:
resources:
cpu: 1
i
Ilya Stolyarov
02/16/2023, 9:37 PM
Ah that makes sense! Thank you! So far I have just limited the whole service to a single worker which I guess has a similar effect.
On a related note, I was wondering if the model is kept in memory the whole time, or is it flushed in and out? An if it is possible to have custom control over that? For example, controlling how the model gets flushed out of memory (e.g. to ensure that state is saved properly).
s
sauyon
02/16/2023, 9:46 PM
You most likely can do those things with a Custom runner and your own implementation of _`_del_`,_ but we don't really have a management story for that kind of model right now otherwise.