BentoML

sorry if this is not the right channel to be asking these questions. Exploring bentoml for the first time and i am trying to see if bentoml fits our use case

I think bentoML is a good fit because you can encapsulate different model/algorithms into the same service!

And you can have separate endpoints for each of your algorithm

Some additional things I would add to above mentioned solution.

• Keep a registry of runners, use `lru_cache`
• Pass a param that represents a runner name in registry
• Then just load the runner from registry and call the endpoint using `runner.run` or `runner.async_run`

thanks for all the responses! are there performance differences between using Flask vs FastApi when serving multiple different models in their own endpoint?

Well the main difference and big one is, FastAPI uses ASGI and Flask uses WSGI, so the concurrency and async capability of FastAPI is better choice for serving multiple models.
However, Flask on a Greenlet-powered WSGI server may give similar performance I would still prefer uses fastApi