BentoML

Hi <@U03HY4VCAN7>, the decision will depend heavily on your use case. If the models are going to work together to fulfill a single use case, it is recommended to put them in the same service as different runners. If they scale and deploy independently serving different use cases, maybe it makes more sense to deploy them as different services.

This documentation may help you about deploying multiple models in the same service as an inference graph. <https://docs.bentoml.org/en/latest/guides/graph.html>