You may try to increase runner numbers to improve GPU utilization following this guide:
https://docs.bentoml.org/en/latest/guides/scheduling.html
For your case because you have a GPU with 40GB ram and one runner only utilize 3GB ram, maybe you can try to set
workers_per_resource
to 10 or higher.