This message was deleted.
# ask-for-help
s
This message was deleted.
c
Hi @Prashant Godhani, yes it is possible to schedule multiple Runner(model) instances on one large GPU, check out the scheduling strategy config guide: https://docs.bentoml.org/en/latest/guides/scheduling.html
For example if you want to fully utilize the GPU memory, you can add something like this to your config:
Copy code
version: 1
api_server:
  timeout: 60
  metrics:
    enabled: false
runners:
  large_cnn_summarize_runner_1:
    resources:
      <http://nvidia.com/gpu|nvidia.com/gpu>: 0
    workers_per_resource: 12
1