BentoML

Untitled

Hi <@U058LBYQDKR>, yes it is possible to schedule multiple Runner(model) instances on one large GPU, check out the scheduling strategy config guide: <https://docs.bentoml.org/en/latest/guides/scheduling.html>

For example if you want to fully utilize the GPU memory, you can add something like this to your config:

```version: 1
api_server:
  timeout: 60
  metrics:
    enabled: false
runners:
  large_cnn_summarize_runner_1:
    resources:
      <http://nvidia.com/gpu|nvidia.com/gpu>: 0
    workers_per_resource: 12```