miniature-nail-35141
08/12/2025, 11:40 AMtall-airline-16756
08/12/2025, 11:40 AMtall-airline-16756
08/12/2025, 11:41 AMpip install "livekit-agents[azure]~=1.0"
2. Initialize the Azure TTS Client: Next, set up the Azure TTS client with your Azure Speech credentials. You can also tweak things like speech rate, volume, and pitch using the ProsodyConfig
class to make it sound just right.
from livekit.plugins.azure import TTS, ProsodyConfig
azure_tts = TTS(
speech_key="your_speech_key",
speech_region="your_speech_region",
voice="en-US-JennyNeural",
prosody=ProsodyConfig(
rate=1.5, # Speech rate adjustment
volume=50, # Volume percentage
pitch="medium" # Pitch level
)
)
3. Synthesize Speech: Now, you can use the synthesize
method to create speech. The settings you chose in ProsodyConfig
will automatically turn into SSML and be applied to your audio.
async def synthesize_text(text):
stream = await azure_tts.synthesize(text)
for chunk in stream:
# Handle the audio chunks
pass
If you want to dive deeper into SSML or customize things further, you can build SSML manually or check out the Azure SSML documentation. Hope this helps you get your speech output sounding just the way you want!
Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.p...miniature-nail-35141
08/12/2025, 11:42 AMtall-airline-16756
08/12/2025, 11:43 AMasync def process_structured_output(
text: AsyncIterable[str],
callback: Optional[Callable[[ResponseEmotion], None]] = None,
) -> AsyncIterable[str]:
last_response = ""
acc_text = ""
async for chunk in text:
acc_text += chunk
try:
resp: ResponseEmotion = from_json(acc_text, allow_partial="trailing-strings")
except ValueError:
continue
if callback:
callback(resp)
if not resp.get("response"):
continue
new_delta = resp["response"][len(last_response) :]
if new_delta:
yield new_delta
last_response = resp["response"]
3. Agent Configuration: Set up your agent so it can handle the LLM’s output and pull out those SSML instructions. Then, pass those instructions over to the TTS engine.
class MyAgent(Agent):
async def llm_node(
self, chat_ctx: ChatContext, tools: list[FunctionTool], model_settings: ModelSettings
):
llm = cast(openai.LLM, self.llm)
tool_choice = model_settings.tool_choice if model_settings else NOT_GIVEN
async with llm.chat(
chat_ctx=chat_ctx,
tools=tools,
tool_choice=tool_choice,
response_format=ResponseEmotion,
) as stream:
async for chunk in stream:
yield chunk
async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
instruction_updated = False
def output_processed(resp: ResponseEmotion):
nonlocal instruction_updated
if resp.get("voice_instructions") and resp.get("response") and not instruction_updated:
tts = cast(openai.TTS, self.tts)
tts.update_options(instructions=resp["voice_instructions"])
return Agent.default.tts_node(
self, process_structured_output(text, callback=output_processed), model_settings
)
So, this setup helps you take the LLM’s output, pull out any SSML instructions, and smoothly apply them to the TTS engine. If you have any questions or want to chat more about it, just let me know!