how to use Azure SSML for real speach output
# ask-ai
m
how to use Azure SSML for real speach output
t
Thanks for reaching out, Rajadurai. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to use Azure SSML for real speech output in LiveKit, here’s a simple way to get started: 1. Install the LiveKit Azure Plugin: This plugin lets you tap into Azure’s Text-to-Speech (TTS) features with SSML support. Just run:
pip install "livekit-agents[azure]~=1.0"
2. Initialize the Azure TTS Client: Next, set up the Azure TTS client with your Azure Speech credentials. You can also tweak things like speech rate, volume, and pitch using the
ProsodyConfig
class to make it sound just right.
Copy code
from livekit.plugins.azure import TTS, ProsodyConfig

azure_tts = TTS(
    speech_key="your_speech_key",
    speech_region="your_speech_region",
    voice="en-US-JennyNeural",
    prosody=ProsodyConfig(
        rate=1.5,  # Speech rate adjustment
        volume=50,  # Volume percentage
        pitch="medium"  # Pitch level
    )
)
3. Synthesize Speech: Now, you can use the
synthesize
method to create speech. The settings you chose in
ProsodyConfig
will automatically turn into SSML and be applied to your audio.
Copy code
async def synthesize_text(text):
    stream = await azure_tts.synthesize(text)
    for chunk in stream:
        # Handle the audio chunks
        pass
If you want to dive deeper into SSML or customize things further, you can build SSML manually or check out the Azure SSML documentation. Hope this helps you get your speech output sounding just the way you want! Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.p...
m
how to genrate SSML text from LLM
t
Hey there! If you want to generate SSML text from an LLM in LiveKit, here’s a simple way to do it: 1. LLM Output Handling: Make sure your LLM outputs text with SSML tags included. You can guide the LLM to add specific voice instructions or other SSML details in its response. 2. Modify the TTS Pipeline: Before the text reaches the TTS engine, catch it and tweak it as needed. You can use a callback function to handle the text and add any SSML tweaks.
Copy code
async def process_structured_output(
    text: AsyncIterable[str],
    callback: Optional[Callable[[ResponseEmotion], None]] = None,
) -> AsyncIterable[str]:
    last_response = ""
    acc_text = ""
    async for chunk in text:
        acc_text += chunk
        try:
            resp: ResponseEmotion = from_json(acc_text, allow_partial="trailing-strings")
        except ValueError:
            continue

        if callback:
            callback(resp)

        if not resp.get("response"):
            continue

        new_delta = resp["response"][len(last_response) :]
        if new_delta:
            yield new_delta
        last_response = resp["response"]
3. Agent Configuration: Set up your agent so it can handle the LLM’s output and pull out those SSML instructions. Then, pass those instructions over to the TTS engine.
Copy code
class MyAgent(Agent):
    async def llm_node(
        self, chat_ctx: ChatContext, tools: list[FunctionTool], model_settings: ModelSettings
    ):
        llm = cast(openai.LLM, self.llm)
        tool_choice = model_settings.tool_choice if model_settings else NOT_GIVEN
        async with llm.chat(
            chat_ctx=chat_ctx,
            tools=tools,
            tool_choice=tool_choice,
            response_format=ResponseEmotion,
        ) as stream:
            async for chunk in stream:
                yield chunk

    async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
        instruction_updated = False

        def output_processed(resp: ResponseEmotion):
            nonlocal instruction_updated
            if resp.get("voice_instructions") and resp.get("response") and not instruction_updated:
                tts = cast(openai.TTS, self.tts)
                tts.update_options(instructions=resp["voice_instructions"])

        return Agent.default.tts_node(
            self, process_structured_output(text, callback=output_processed), model_settings
        )
So, this setup helps you take the LLM’s output, pull out any SSML instructions, and smoothly apply them to the TTS engine. If you have any questions or want to chat more about it, just let me know!