This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

06/08/2023, 3:22 PM

This message was deleted.

Chaoyu

06/08/2023, 4:07 PM

This largely depends on the runnable implementation, most of the built in framework runners should not have this problem. How much memory usage are you observing?

Chaoyu

06/08/2023, 4:07 PM

Have you confirmed that the model loaded to GPU?

Mikel Menta

06/08/2023, 4:23 PM

hi thanks for the fast response. I’m using the built in onnx runner for all these models and all of them were correctly loaded in gpu

Mikel Menta

06/08/2023, 4:24 PM

I see also that the RAM used by each runner process is almost the same

Mikel Menta

06/08/2023, 4:42 PM

ok I just found that it might be related to a custom runner that I was importing but not using. After removing the import, each runner uses around 1.2G of RAM (instead of 2.6G as seen in the screenshot). Is 1.2G a normal number?

Jiang

06/09/2023, 4:29 AM

Yes it is normal for most large models cc @Aaron Pham

👍 1

Mikel Menta

06/09/2023, 7:47 AM

I investigated a bit more and I think that the high RAM usage is caused by having both ONNX and PyTorch models in the same service: • If I only keep the ONNX models each runner process takes around 1.7G • If I only keep the PyTorch models each runner process takes around 1.8G • If I keep both (or just

import torch

somewhere in the code) the ONNX runner processes go up to 3GB. Interestingly the PyTorch runners still keep using 1.8G. Does this make sense? Is there a way to avoid this happening (importing torch, I guess) in the ONNX runners?

Aaron Pham

06/09/2023, 6:51 PM

Can you send an example of your runnable implementation?

Mikel Menta

06/13/2023, 10:41 AM

Hi @Aaron Pham it is not affected by my runnable (I removed it). I presume that is caused by having PyTorch and ONNX runners loaded in the same service as I said in my previous message

Open in Slack

Previous Next