This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

12/19/2022, 4:54 PM

This message was deleted.

🏁 1

replied 1

🍱 2

Jim Rohrer

12/19/2022, 5:33 PM

This is just a semi-educated guess, but you may need to specify in your configuration which GPUs you want a runner to run on:

Copy code

runners:
 pytorch_mnist:
   resources:
     <http://nvidia.com/gpu|nvidia.com/gpu>: [0, 1]

👍 1

Jim Rohrer

12/19/2022, 5:33 PM

I'm not sure if there's an "all" option

Thomas Jacquemin

12/19/2022, 5:38 PM

Unfortunately I tried forcing the runners repartion in the configuration, but bentoml identify (correctly?) one GPU: `Error: [bentoml-cli]

serve

failed: GPU device index in [0, 1] is greater than the system available: [0]` But I would like to specify multi-instance gpu "partition"

Jim Rohrer

12/19/2022, 5:56 PM

ohhhh gotcha, there's one physical GPU partitioned into 2 logical GPUs. That's cool, didn't know you could do that.

👍 2

Thomas Jacquemin

12/19/2022, 5:59 PM

Yes, this is a feature that comes with the NVIDIA Ampere architecture. The official documentation specifies that it is a way to securely partition "into up to seven separate GPU Instances for CUDA applications, providing multiple users with separate GPU resources for optimal GPU utilization".

Thomas Jacquemin

12/19/2022, 6:00 PM

I have a node pool in my k8s cluster that is configured with such partition, so I would like to exploit the full potential of this optimisation with my bentoML service 🙂

Jim Rohrer

12/19/2022, 6:07 PM

That's super cool. I know partial GPU usage is on the roadmap for Bento, but I don't know if this is what they were indicating.

Chaoyu

12/19/2022, 8:07 PM

cc @Jiang @larme (shenyang) any suggestions?

Jiang

12/20/2022, 2:14 AM

Hi. Yeah. It is a valid usage. Here's the official guide to set it up. https://aws.amazon.com/blogs/containers/utilizing-nvidia-multi-instance-gpu-mig-in-a[…]n-ec2-p4d-instances-on-amazon-elastic-kubernetes-service-eks/ My personal recommendation is to put an eye on: 1.

step 3

make sure nvidia device plugin is properly set up with MIG enabled. 2. use something like

<http://nvidia.com/mig-1g.5gb|nvidia.com/mig-1g.5gb>: 1

instead of

<http://nvidia.com/gpu|nvidia.com/gpu>

in the resources limit.

🙏 1

Aaron Pham

12/21/2022, 1:53 AM

Hi @Thomas Jacquemin, has this problem been solved yet?

Jiang

12/22/2022, 6:04 AM

Hi @Thomas Jacquemin I will mark this tread as resolved for now.

Open in Slack

Previous Next