This message was deleted.
# ask-for-help
s
This message was deleted.
s
Hi! If your model is stateful I assume you need to have only one instance of it at one time; by default BentoML will spawn multiple for scaling.
You might want to try setting a CPU limit of 1 as a workaround in the BentoML runner configuration:
Copy code
runners:
  runner_name:
    resources:
      cpu: 1
i
Ah that makes sense! Thank you! So far I have just limited the whole service to a single worker which I guess has a similar effect. On a related note, I was wondering if the model is kept in memory the whole time, or is it flushed in and out? An if it is possible to have custom control over that? For example, controlling how the model gets flushed out of memory (e.g. to ensure that state is saved properly).
s
You most likely can do those things with a Custom runner and your own implementation of _`_del_`,_ but we don't really have a management story for that kind of model right now otherwise.
i
I see. Thank you for the responses 🙏