Slackbot
02/21/2023, 5:17 PMChaoyu
02/21/2023, 11:15 PM--reload
is designed and built for development purpose, we do not recommend using it for serving production traffic and not an ideal solution for hot loading modelsChaoyu
02/21/2023, 11:16 PMChaoyu
02/21/2023, 11:17 PMJori Geysen
02/22/2023, 4:04 PM--reload
.
I think I found a decent workaround though;
• I still have a prediction endpoint, but slightly updated it. Now it's updating the runner to reflect the changes in the latest
file in the /home/bentoml/bento/models/model_name/
directory.
runner = bentoml.transformers.get("model_name:latest").to_runner()
svc = bentoml.Service("model_name", runners=[runner])
@svc.api(
input=JSON(pydantic_model=DataPoint), output=JSON(), route="api/v1/ops/predict"
)
async def predict(data_point) -> list:
runner.models.clear()
latest_model = bentoml.models.get("model_name:latest")
runner.models.append(latest_model)
return await runner.async_run(data_point.dict(), truncation=True)
• I still have another endpoint which updates the model and the latest
file in the /home/bentoml/bento/models/model_name/
directory:
◦ Download a model in the /home/bentoml/downloaded_models
directory in the container.
◦ Call the bentoml.transformers.save_model
method with a transfomer_pipeline
pointing to the /home/bentoml/downloaded_models
directory, which contains the downloaded model. This saves the newly downloaded model as a new directory in /home/bentoml/bento/models/model_name/YYYY
directory and updates latest
to point to YYYY
.
This way, the runner is always running the latest
model.
Same questions here; are you seeing any red flags here related to scaling of the runners or other concerns? Thanks again in advance 🙂Jori Geysen
02/22/2023, 4:10 PM