Slackbot
06/06/2023, 2:11 PMLucas Wei
06/06/2023, 2:13 PMLucas Wei
06/06/2023, 2:18 PMChaoyu
06/06/2023, 2:49 PMLucas Wei
06/06/2023, 2:53 PMimport bentoml
from transformers import AutoTokenizer, SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
bentoml.transformers.save_model("speecht5_tts_processor", processor)
bentoml.transformers.save_model("speecht5_tts_model", model, signatures={"generate_speech": {"batchable": False}})
bentoml.transformers.save_model("speecht5_tts_vocoder", vocoder)
%%writefile service.py
import bentoml
import torch
from <http://bentoml.io|bentoml.io> import Text, NumpyNdarray
from datasets import load_dataset
proccessor_runner = bentoml.transformers.get("speecht5_tts_processor").to_runner()
model_runner = bentoml.transformers.get("speecht5_tts_model").to_runner()
vocoder_runner = bentoml.transformers.get("speecht5_tts_vocoder").to_runner()
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
svc = bentoml.Service("text2speech", runners=[proccessor_runner, model_runner, vocoder_runner])
@svc.api(input=Text(), output=NumpyNdarray())
def generate_speech(inp: str):
inputs = proccessor_runner.run(text=inp, return_tensors="pt")
speech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run)
return speech.numpy()
Lucas Wei
06/06/2023, 2:56 PM!pip install -r <https://raw.githubusercontent.com/bentoml/BentoML/main/examples/quickstart/requirements.txt>
!pip install git+<https://github.com/huggingface/transformers.git>
!pip install sentencepiece
!pip install torchaudio
!pip install datasets
!pip install cchardet
!pip install tensorflow
Lucas Wei
06/06/2023, 3:31 PM!pip install git+<https://github.com/huggingface/transformers.git>
!pip install sentencepiece
!pip install soundfile
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech
# load the pre-train model
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
inputs = processor(text="Hello word.", return_tensors="pt")
# load the embeding data
from datasets import load_dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
# embeding data voice data
import torch
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
spectrogram = model.generate_speech(inputs["input_ids"], speaker_embeddings)
# load the voice
from transformers import SpeechT5HifiGan
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# run the predict
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
import soundfile as sf
sf.write("./tts_example.wav", speech.numpy(), samplerate=16000)
Chaoyu
06/07/2023, 3:52 AMLucas Wei
06/07/2023, 1:32 PMlarme (shenyang)
06/07/2023, 1:46 PMservice.py
, then serve the model using bentoml serve service:svc
and it seems to work fine. Maybe you can try this routine first?larme (shenyang)
06/07/2023, 1:49 PMbentoml serve service:svc
, I will dig deeper for this issueLucas Wei
06/07/2023, 2:22 PMLucas Wei
06/07/2023, 2:23 PMLucas Wei
06/07/2023, 2:25 PMlarme (shenyang)
06/07/2023, 2:44 PMspeech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run)
where you are passing vocoder_runner.run
to another runner call. vocoder_runner.run
is not a simple function and is not picklable. (notes: every runner call's arguments are serialized and then send to runner server to be executed here, by default pickle is used to serialize the argument)larme (shenyang)
06/07/2023, 2:44 PMspeech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings)
then the code runslarme (shenyang)
06/07/2023, 2:44 PMvocoder_runner.run
?larme (shenyang)
06/07/2023, 2:51 PMLucas Wei
06/07/2023, 3:23 PMLucas Wei
06/07/2023, 3:24 PMlarme (shenyang)
06/07/2023, 3:31 PMLucas Wei
06/07/2023, 3:33 PMlarme (shenyang)
06/07/2023, 4:34 PMLucas Wei
06/07/2023, 4:36 PMLucas Wei
06/09/2023, 7:45 AMlarme (shenyang)
06/09/2023, 7:47 AMspeech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
to speech = model.generate_speech(inputs["input_ids"], speaker_embeddings)
and the codes should work.Lucas Wei
06/09/2023, 7:51 AMLucas Wei
06/09/2023, 7:52 AMlarme (shenyang)
06/09/2023, 7:54 AMLucas Wei
06/09/2023, 7:56 AM