BentoML uses uvicorn as a component but handles the scheduling of each runner process group. You can use uvicron to launch directly but it should only be used in dev environments because in that mode, all models are loaded in the main process and not great for resource utilization. See notes here https://github.com/bentoml/BentoML/tree/main/src/bentoml/_internal/server
a
Ashish Singh
04/30/2023, 5:28 PM
Makes sense, from what i got if we use this approach runners won't be a separate thread, service and runners would be 1 process.
So using async runner method is basically sync under the hood?
c
Chaoyu
04/30/2023, 6:12 PM
that’s correct, it will essentially be sync under the hood in that case