This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

05/16/2023, 1:10 AM

This message was deleted.

🤔 1

Jiang

05/16/2023, 2:37 AM

Hi Jack.

Because of this, each time they use it, it seems like the service has to restart and load all the runners and it takes a long time for them to be able to actually use the endpoint.

I don't think it is the default behavior of bentoml. This might be caused by a service.py not in best practice.

Jiang

05/16/2023, 2:48 AM

Would you mind sharing your service.py? You can blur sensitive information before posting it.

Jack Norman

05/16/2023, 3:49 AM

Hi @Jiang, sure thing, it’s possible I’ve made a mistake in the service.py. The service.py is essentially this:

Copy code

# define IO descriptors
# ...
success_probability_input_descriptor = (JSON(pydantic_model=SuccessProbabilityInputDescriptor))
success_probability_output_descriptor = (JSON(pydantic_model=SuccessProbabilityOutputDescriptor))
# ...

# Load Bento Models from Google Cloud Storage; also convert them to Runners
bento_models, runners = load_bentos_from_gcs()

svc = bentoml.Service("probability-models-bento", runners=list(runners.values()))


def success_probability_pipeline(input_data):
    # do stuff with input_data


@svc.api(input=success_probability_input_descriptor,
         output=success_probability_output_descriptor,
         route="success/predict_proba")
def success_predict_proba(
        input_data: SuccessProbabilityInputDescriptor) -> SuccessProbabilityOutputDescriptor:
    validated_results = success_probability_pipeline(input_data)
    return validated_results

# other pipelines ...

Jiang

05/16/2023, 4:41 AM

has to restart and load all the runners

Can you tell me about the relevant phenomenon behind this? And would you like provide the source of

load_bentos_from_gcs

Jack Norman

05/16/2023, 4:53 AM

@Jiang Sure, this is `load_bentos_from_gcs`:

Copy code

def load_bentos_from_gcs(bento_framework_model_names, load_from_gcs=True):
    bento_models = {}
    runners = {}
    for model_tmp in bento_framework_model_names:
        model_name_tmp = model_tmp["model_name"]
        framework = model_tmp["framework"]
        if load_from_gcs:
            <http://bentoml_logger.info|bentoml_logger.info>(f"Importing {BUCKET}/{model_name_tmp}.bentomodel from GCS")
            try:
                bentoml.models.import_model(f"{BUCKET}/{model_name_tmp}.bentomodel")
            except (BentoMLException, ImportServiceError) as e:
                bentoml_logger.warning(f"The Bento model is already in the store. No harm done, though! "
                                       f"The {e.__class__.__name__} exception caught from Bento was:"
                                       f"\n\n\t'{e}'\n\nProceeding with the rest of the build.")

        <http://bentoml_logger.info|bentoml_logger.info>(f"Loading in the {model_name_tmp}:latest model using the {framework} framework.")
        if framework == "sklearn":
            bento_model = bentoml.sklearn.get(f"{model_name_tmp}:latest")
        elif framework == "picklable_model":
            bento_model = bentoml.picklable_model.get(f"{model_name_tmp}:latest")

        <http://bentoml_logger.info|bentoml_logger.info>("Success.")
        <http://bentoml_logger.info|bentoml_logger.info>(f"Converting the model to a runner")
        runner = bento_model.to_runner()
        bento_models[model_name_tmp] = bento_model
        runners[model_name_tmp] = runner
    return bento_models, runners

Jack Norman

05/16/2023, 5:27 AM

With regards to the restarting and loading in of all the runners: When I make a request (after a fair amount of idle time), I notice the endpoint hangs for a long time waiting for a response. I look at my Cloud Run Service logs and see the loading from GCS occurring. Once all the models are loaded in (on each one of the 8 CPUs) (like 5 minutes later), then the request gets completed successfully. Thereafter, the requests are fast as usual.

Jiang

05/16/2023, 5:35 AM

Sure, I understand that: 1. You are using the Cloud Run Serverless service, which will automatically shut down the server after a certain period of inactivity (about 15 minutes) to save costs. 2. Your Bento model is being re-downloaded every time the server starts. The combination of these two factors is causing the current issue of unavailability.

Jiang

05/16/2023, 5:40 AM

There are two possible solutions to this issue. One option is to set Cloud Run to always be on: https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation The other option is to maintain the resource-saving feature and tolerate short request delays by modifying your Bento container. You can achieve this by importing the model during image build using "setup.sh": https://docs.bentoml.org/en/latest/guides/containerization.html This approach should reduce cold start waiting time to a few seconds (usually depending on how long it takes for your model to load from disk into memory).

Jiang

05/16/2023, 5:42 AM

You could also consider our hosted BentoCloud, which support both serverless and serverful deployment.

Jack Norman

05/16/2023, 3:34 PM

@Jiang, thank you very much for these options and taking the time to explain! Yes, I think our immediate solution will be setting Cloud Run to always be on. We chose not to import the model during image build because we liked the idea of keeping the models separate from the service. Our reasoning was that if we were to retrain a model, we could simply just upload the new model to GCS and restart the service. With a retrained model using the image build approach, we would have to rebuild the image and redeploy. Maybe that doesn’t actually save that much down time, though. I will look into BentoCloud again. Is it an easy transition from Cloud Run to BentoCloud?

👍 1

Jack Norman

05/16/2023, 5:48 PM

@Jiang, I was able to turn that option on using

gcloud beta run services update SERVICE-NAME --no-cpu-throttling

. This made a Revision deployment. That’s fine, but now when I try to deploy using bentoctl, it gives me the error

Copy code

Error 409: Revision named 'probability-models-bento-cloud-run-service-00023-pud' with different configuration already exists.

How can I either pull the configuration or update the bentoctl commands to include the --no-cpu-throttling flag?

Jiang

05/17/2023, 9:32 AM

It seem that we need a new revision name?

Jiang

05/17/2023, 9:40 AM

Is it an easy transition from Cloud Run to BentoCloud?

Yeah it is much easier since you will only need your bento to use BentoCloud, and you already have one.

Jiang

05/17/2023, 9:54 AM

I am not the maintainer of bentoctl. My suggestion is to first delete the revision, modify the terraform file to include

--no-throttle

, and then recreate it through terraform.

Open in Slack

Previous Next