Slackbot
01/09/2023, 9:20 PMXipeng Guan
01/10/2023, 2:09 AMBentoRequest
CR to trigger the image builder.
Yes, now we can set the spec.imageBuilderExtraPodSpec.affinity
of the BentoRequest
CR to specify the node affinity for the image builder pod.Ghawady Ehmaid
01/10/2023, 3:51 AMbentoml build
2. bentoml push $bentoname
3. create BentoRequest
CR and run kubectl apply
4. create BentoDeployment
CR and run kubectl apply
Is that correct?Xipeng Guan
01/10/2023, 3:52 AMGhawady Ehmaid
01/11/2023, 5:46 PMGhawady Ehmaid
01/11/2023, 5:52 PMservice: "src.service:svc"
labels:
owner: plaetos
stage: dev
include:
- "service.py"
- "config.py"
python:
requirements_txt: "./requirements.txt"
docker:
python_version: "3.7"
cuda_version: "11.2"
and the requirements.txt
-f <https://download.pytorch.org/whl/cu113>
torch==1.12.0
torchvision==0.13.0
torchaudio==0.12.0
pandas==1.1.4
transformers[sentencepiece]==4.23.1
jax==0.3.24
flax==0.3.3
minio
python-dotenv
Following is the snippet from service.py
containing the runner instance configuration
import bentoml
from <http://bentoml.io|bentoml.io> import JSON
import torch
from .config import MODEL_NAME, MODEL_VERSION, SERVICE_NAME
import pandas as pd
#from <http://bentoml.io|bentoml.io> import PandasDataFrame
class SentimentRunnable(bentoml.Runnable):
SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>", "cpu")
SUPPORTS_CPU_MULTI_THREADING = True
def __init__(self):
self.model = bentoml.transformers.load_model(sentiment_model)
print(f"self.model: {self.model}")
@bentoml.Runnable.method(batchable=True)
def __call__(self, input_text):
return self.model(input_text)
sentiment_model = bentoml.transformers.get(f"{MODEL_NAME}:{MODEL_VERSION}")
sentiment_runner = bentoml.Runner(
SentimentRunnable,
name="sentimentrunner_v1",
models=[sentiment_model]
)
Ghawady Ehmaid
01/11/2023, 6:31 PMBentoRequest
CR to create the bento image, but didn't use BentoDeployment
CR as we are using knative to scale containers down to zero.
Following is the knative service spec for reference to verify the args passed:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: sentiment-3grade-service-1-bbd-12
namespace: default
spec:
template:
spec:
containers:
- args:
- --api-workers=1
- --production
- --port=5000
env:
- name: MODEL_NAME
value: sentiment-3grade-model-v1
- name: MODEL_VERSION
value: latest
- name: SERVICE_NAME
value: sentiment-3grade-service
- name: GPU_ENABLED
value: 'true'
image: <path to image created by yatai-image-builder>
livenessProbe:
httpGet:
path: /healthz
initialDelaySeconds: 3
periodSeconds: 5
ports:
- containerPort: 5000
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
initialDelaySeconds: 3
periodSeconds: 5
timeoutSeconds: 300
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1
tolerations:
- effect: NoSchedule
key: gpu
operator: Equal
value: "true"
Jiang
01/12/2023, 2:09 PMJiang
01/12/2023, 2:10 PMhttps://files.slack.com/files-pri/TK999HSJU-F04J9M1SCCE/image.pngβΎ
Jiang
01/12/2023, 2:10 PMGhawady Ehmaid
01/12/2023, 6:40 PMGhawady Ehmaid
01/12/2023, 6:43 PMJiang
01/13/2023, 3:13 AMJiang
01/13/2023, 3:13 AMimport bentoml
from <http://bentoml.io|bentoml.io> import JSON
import torch
from .config import MODEL_NAME, MODEL_VERSION, SERVICE_NAME
import pandas as pd
#from <http://bentoml.io|bentoml.io> import PandasDataFrame
class SentimentRunnable(bentoml.Runnable):
SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>", "cpu")
SUPPORTS_CPU_MULTI_THREADING = True
def __init__(self):
self.model = bentoml.transformers.load_model(sentiment_model)
print(f"self.model: {self.model}")
@bentoml.Runnable.method(batchable=True)
def __call__(self, input_text):
return self.model(input_text)
sentiment_model = bentoml.transformers.get(f"{MODEL_NAME}:{MODEL_VERSION}")
sentiment_runner = bentoml.Runner(
SentimentRunnable,
name="sentimentrunner_v1",
models=[sentiment_model]
)
The customize runner doesn't support GPU inference.Jiang
01/13/2023, 3:15 AMclass SentimentRunnable(bentoml.Runnable):
SUPPORTED_RESOURCES = ("<http://nvidia.com/gpu|nvidia.com/gpu>", "cpu")
SUPPORTS_CPU_MULTI_THREADING = True
def __init__(self):
if <check if gpu exists>:
self.model = bentoml.transformers.load_model(sentiment_model, device=0)
else:
self.model = bentoml.transformers.load_model(sentiment_model, device=-1)
print(f"self.model: {self.model}")
Jiang
01/13/2023, 3:15 AMJiang
01/13/2023, 3:16 AMTransformersRunnable
.Ghawady Ehmaid
01/13/2023, 3:26 AMJiang
01/13/2023, 3:28 AMbentoml.transformers
runner?Ghawady Ehmaid
01/13/2023, 3:38 AMJiang
01/13/2023, 3:42 AMGhawady Ehmaid
01/13/2023, 3:43 AMTransformersRunnable
class and don't see a reason for not using it. This is better than creating a custom runner that does the same thing