BentoML

BentoML uses uvicorn as a component but handles the scheduling of each runner process group. You can use uvicron to launch directly but it should only be used in dev environments because in that mode, all models are loaded in the main process and not great for resource utilization. See notes here <https://github.com/bentoml/BentoML/tree/main/src/bentoml/_internal/server|https://github.com/bentoml/BentoML/tree/main/src/bentoml/_internal/server>

Makes sense, from what i got if we use this approach runners won't be a separate thread, service and runners would be 1 process.

So using async runner method is basically sync under the hood?

that’s correct, it will essentially be sync under the hood in that case

Thank you for clearing that doubt for me :pray: