This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

03/06/2023, 9:44 AM

This message was deleted.

👀 1

larme (shenyang)

03/06/2023, 9:49 AM

Hi Ariel, when you are in htop, if you press

shift + h

, does htop still output so many bentoml lines? By default htop will output all threads in a process, where

api_server.workers=1

will run one api_server process but with some threads.

Ariel Zadok

03/06/2023, 9:51 AM

So bentoml is opening multiple threads?

Ariel Zadok

03/06/2023, 10:00 AM

It still show by the way, even after shift + h

larme (shenyang)

03/06/2023, 10:05 AM

could you try the output of

ps -aux |grep http_api_server

Ariel Zadok

03/06/2023, 10:06 AM

@Asaf Horovitz

larme (shenyang)

03/06/2023, 10:47 AM

That means only one api_server process is running. I think maybe your htop has different key bindings with my htop.

larme (shenyang)

03/06/2023, 10:48 AM

api_server

using multithreading should not affect memory usage because threads should share memory.

larme (shenyang)

03/06/2023, 10:49 AM

From the htop screenshot I think runner is using around 37 GB memory and api_server is using around 19 GB memory. Is this not what you expect?

Ariel Zadok

03/06/2023, 10:52 AM

runner using around 37 GB is somewhat logical

Ariel Zadok

03/06/2023, 10:53 AM

api_server that is using 19GB is very odd

larme (shenyang)

03/06/2023, 10:54 AM

can you share your

service.py

? I think maybe because api_server also load the model?

Ariel Zadok

03/06/2023, 10:54 AM

Copy code

import os
import time
import bentoml
import mlflow
import pandas as pd
from fastapi import FastAPI
from utils import CORRELATION_ID_HEADER, generate_correlation_id, generate_input_and_output_descriptors
from custom_bento_service import CustomBentoService


BENTO_REGISTRY_MODEL_NAME = os.environ['BENTO_REGISTRY_MODEL_NAME']
BENTO_REGISTRY_MODEL_VERSION = os.environ['BENTO_REGISTRY_MODEL_VERSION']
OPEN_API_PREFIX = os.environ.get('OPEN_API_PREFIX', '')

bento_model_path = f"{BENTO_REGISTRY_MODEL_NAME}:{BENTO_REGISTRY_MODEL_VERSION}"

pyfunc_model: mlflow.pyfunc.PyFuncModel = bentoml.mlflow.load_model(bento_model_path)
artifact_name = pyfunc_model.metadata.artifact_path
bento_model = bentoml.mlflow.get(bento_model_path)

input_descriptor, output_descriptor = generate_input_and_output_descriptors(bento_model_path, pyfunc_model)

del pyfunc_model

model_runner = bento_model.to_runner()
svc = CustomBentoService(BENTO_REGISTRY_MODEL_NAME, runners=[model_runner])



@svc.api(
    input=input_descriptor,
    output=output_descriptor,
)
def predict(input_df: pd.DataFrame, ctx: bentoml.Context) -> pd.DataFrame:
    start_time = time.time()
    # get request headers
    request_headers = ctx.request.headers
    x_cr_id = request_headers.get(CORRELATION_ID_HEADER)
    if x_cr_id is None:
        x_cr_id = generate_correlation_id(artifact_name)
    response = model_runner.run(input_df)
    process_time = time.time() - start_time
    ctx.response.headers.append(CORRELATION_ID_HEADER, x_cr_id)
    ctx.response.headers.append("x-process-time", str(process_time))
    return response


fastapi_app = FastAPI(
    openapi_url=f"/docs.json",
    root_path=f"{OPEN_API_PREFIX}",
)
svc.mount_asgi_app(fastapi_app)


@fastapi_app.get("/metadata")
def metadata():
    return {"name": bento_model.tag.name, "version": bento_model.tag.version}

larme (shenyang)

03/06/2023, 11:03 AM

Maybe

del pyfunc_model

won't free up the memory (immediately). Could you try adding gc codes:

Copy code

import gc
del pyfunc_model
gc.collect()

and see what happens?

Ariel Zadok

03/06/2023, 11:04 AM

Will try

Ariel Zadok

03/06/2023, 11:04 AM

What about the 'bentoml.mlflow.get(bento_model_path)'?

Ariel Zadok

03/06/2023, 11:04 AM

Which then I transfer to the runner?

Ariel Zadok

03/06/2023, 11:05 AM

Is it copied in the background to a different process?

Ariel Zadok

03/06/2023, 11:06 AM

and if so? the reference holds a copy?

larme (shenyang)

03/06/2023, 11:06 AM

bentoml.mlflow.get(bento_model_path)

will only return a reference to a model inside bentoml modelstore.

to_runner

will turn this reference to a runner that will be lazy loaded (including models used by runner) inside the runner process.

Ariel Zadok

03/06/2023, 11:07 AM

Ariel Zadok

03/06/2023, 11:48 AM

looks like adding the

gc.collect()

didnt help

Ariel Zadok

03/06/2023, 11:48 AM

any other suggestions?

Patrick Alves

04/03/2023, 4:22 PM

I am having a similar issue

23 Views

Open in Slack

Previous Next