<@U084APK40NS> how to use before_tts_cb for adding...
# ask-ai
d
@millions-hair-24956 how to use before_tts_cb for adding SSML on Google Voice ?
g
To use
before_tts_cb
for adding SSML on Google Voice, you can define a callback function that modifies the text input to include SSML tags before it is sent to the TTS engine. This callback is executed just before the text is synthesized into speech, allowing you to customize the pronunciation or add SSML for better control over speech synthesis. Here's a basic example of how you might implement this in Python:
Copy code
def before_tts_cb(assistant, text):
    # Add SSML tags to the text
    ssml_text = f"<speak>{text}</speak>"
    return ssml_text

# Initialize the VoicePipelineAgent with the before_tts_cb
assistant = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=google.TTS(),
    before_tts_cb=before_tts_cb
)
This example wraps the text in
<speak>
tags, which is a basic SSML structure. You can further customize the SSML to include pauses, emphasis, or other speech effects as needed. Would you like more details on SSML customization or the setup process? Sources: - VoicePipelineAgent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
d
when I do this, the agent read <speak> etc
@millions-hair-24956 when I do this, the agent read <speak> etc
g
It seems likely that the TTS engine is not recognizing the SSML tags and is instead reading them as plain text. This can happen if the TTS engine is not configured to process SSML or if the SSML is not correctly formatted. To address this, ensure that the TTS engine you are using supports SSML and is configured to interpret it. For Google TTS, you might need to specify that the input type is SSML. Here's an example of how you might adjust the setup:
Copy code
def before_tts_cb(assistant, text):
    # Add SSML tags to the text
    ssml_text = f"<speak>{text}</speak>"
    return ssml_text

# Initialize the VoicePipelineAgent with the before_tts_cb
assistant = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=google.TTS(input_type='ssml'),  # Ensure the TTS is set to interpret SSML
    before_tts_cb=before_tts_cb
)
Make sure that the
google.TTS
instance is correctly set up to handle SSML input. If the TTS engine still reads the tags, double-check the documentation for the specific TTS service to ensure it supports SSML and that you are using the correct configuration. Would you like more detailed guidance on configuring Google TTS for SSML? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1733426789.731689 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1727280293.627769 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
d
there is no such parameter in the class : class *TTS*(tts.TTS): def *__init__*( self, *, language: SpeechLanguages | str = "en-US", gender: Gender | str = "neutral", voice_name: str = "", # Not required encoding: AudioEncoding | str = "linear16", sample_rate: int = 24000, pitch: int = 0, effects_profile_id: str = "", speaking_rate: float = 1.0, credentials_info: dict | None = None, credentials_file: str | None = None, ) -> None: """ super().__init__( capabilities=tts.TTSCapabilities( streaming=False, ), sample_rate=sample_rate, num_channels=1, ) self._client: texttospeech.TextToSpeechAsyncClient | None = None self._credentials_info = credentials_info self._credentials_file = credentials_file voice = texttospeech.VoiceSelectionParams( name=voice_name, language_code=language, ssml_gender=_gender_from_str(gender), ) if encoding == "linear16" or encoding == "wav": _audio_encoding = texttospeech.AudioEncoding.LINEAR16 elif encoding == "mp3": _audio_encoding = texttospeech.AudioEncoding.MP3 else: raise NotImplementedError(f"audio encoding {encoding} is not supported") self._opts = _TTSOptions( voice=voice, audio_config=texttospeech.AudioConfig( audio_encoding=_audio_encoding, sample_rate_hertz=sample_rate, pitch=pitch, effects_profile_id=effects_profile_id, speaking_rate=speaking_rate, ), ) def *update_options*( self, *, language: SpeechLanguages | str = "en-US", gender: Gender | str = "neutral", voice_name: str = "", # Not required speaking_rate: float = 1.0, ) -> None: """ self._opts.voice = texttospeech.VoiceSelectionParams( name=voice_name, language_code=language, ssml_gender=_gender_from_str(gender), ) self._opts.audio_config.speaking_rate = speaking_rate def *_ensure_client*(self) -> texttospeech.TextToSpeechAsyncClient: if self._client is None: if self._credentials_info: self._client = ( texttospeech.TextToSpeechAsyncClient.from_service_account_info( self._credentials_info ) ) elif self._credentials_file: self._client = ( texttospeech.TextToSpeechAsyncClient.from_service_account_file( self._credentials_file ) ) else: self._client = texttospeech.TextToSpeechAsyncClient() assert self._client is not None return self._client def *synthesize*( self, text: str, *, conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS, ) -> "ChunkedStream": return ChunkedStream( tts=self, input_text=text, conn_options=conn_options, opts=self._opts, client=self._ensure_client(), )
@millions-hair-24956 there is no such parameter in the class : class *TTS*(tts.TTS): def *__init__*( self, *, language: SpeechLanguages | str = "en-US", gender: Gender | str = "neutral", voice_name: str = "", # Not required encoding: AudioEncoding | str = "linear16", sample_rate: int = 24000, pitch: int = 0, effects_profile_id: str = "", speaking_rate: float = 1.0, credentials_info: dict | None = None, credentials_file: str | None = None, ) -> None: """ super().__init__( capabilities=tts.TTSCapabilities( streaming=False, ), sample_rate=sample_rate, num_channels=1, ) self._client: texttospeech.TextToSpeechAsyncClient | None = None self._credentials_info = credentials_info self._credentials_file = credentials_file voice = texttospeech.VoiceSelectionParams( name=voice_name, language_code=language, ssml_gender=_gender_from_str(gender), ) if encoding == "linear16" or encoding == "wav": _audio_encoding = texttospeech.AudioEncoding.LINEAR16 elif encoding == "mp3": _audio_encoding = texttospeech.AudioEncoding.MP3 else: raise NotImplementedError(f"audio encoding {encoding} is not supported") self._opts = _TTSOptions( voice=voice, audio_config=texttospeech.AudioConfig( audio_encoding=_audio_encoding, sample_rate_hertz=sample_rate, pitch=pitch, effects_profile_id=effects_profile_id, speaking_rate=speaking_rate, ), ) def *update_options*( self, *, language: SpeechLanguages | str = "en-US", gender: Gender | str = "neutral", voice_name: str = "", # Not required speaking_rate: float = 1.0, ) -> None: """ self._opts.voice = texttospeech.VoiceSelectionParams( name=voice_name, language_code=language, ssml_gender=_gender_from_str(gender), ) self._opts.audio_config.speaking_rate = speaking_rate def *_ensure_client*(self) -> texttospeech.TextToSpeechAsyncClient: if self._client is None: if self._credentials_info: self._client = ( texttospeech.TextToSpeechAsyncClient.from_service_account_info( self._credentials_info ) ) elif self._credentials_file: self._client = ( texttospeech.TextToSpeechAsyncClient.from_service_account_file( self._credentials_file ) ) else: self._client = texttospeech.TextToSpeechAsyncClient() assert self._client is not None return self._client def *synthesize*( self, text: str, *, conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS, ) -> "ChunkedStream": return ChunkedStream( tts=self, input_text=text, conn_options=conn_options, opts=self._opts, client=self._ensure_client(), )
g
The
TTS
class you provided does not explicitly include a parameter for specifying SSML input. However, you can still use SSML by ensuring that the text you pass to the
synthesize
method is properly formatted as SSML. The key is to ensure that the TTS engine is configured to interpret the input as SSML, which is typically done by wrapping the text in
<speak>
tags and ensuring the TTS service supports SSML. Here's how you can modify the
synthesize
method to handle SSML:
Copy code
def synthesize(
    self,
    text: str,
    *,
    conn_options: APIConnectOptions = DEFAULT_API_CONNECT_OPTIONS,
) -> "ChunkedStream":
    # Ensure the text is wrapped in SSML tags
    ssml_text = f"<speak>{text}</speak>"
    return ChunkedStream(
        tts=self,
        input_text=ssml_text,  # Pass the SSML text
        conn_options=conn_options,
        opts=self._opts,
        client=self._ensure_client(),
    )
Make sure that the TTS service you are using supports SSML and is configured to process it. If the TTS service requires any specific configuration to enable SSML, you may need to adjust the initialization or options accordingly. Would you like more detailed guidance on configuring the TTS service for SSML? Sources: - https://docs.livekit.io/python/livekit/plugins/elevenlabs/index.html#args 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
d
@millions-hair-24956 do I change the synthetize object directly in my agent code ? If so, what do I put in DEFAULT_API_CONNECT_OPTIONS
g
It seems likely that you can modify the
synthesize
method in your agent code to handle SSML input directly. The
DEFAULT_API_CONNECT_OPTIONS
is typically a configuration object that specifies connection parameters for the API, such as timeouts or retries. If it's not explicitly defined in your code, you might need to create it based on the requirements of your TTS service. Here's a basic example of how you might define and use `DEFAULT_API_CONNECT_OPTIONS`:
Copy code
# Define default API connection options
DEFAULT_API_CONNECT_OPTIONS = {
    'timeout': 30,  # Example timeout setting
    'retries': 3,   # Example retry setting
}

def synthesize(
    self,
    text: str,
    *,
    conn_options: dict = DEFAULT_API_CONNECT_OPTIONS,
) -> "ChunkedStream":
    # Ensure the text is wrapped in SSML tags
    ssml_text = f"<speak>{text}</speak>"
    return ChunkedStream(
        tts=self,
        input_text=ssml_text,  # Pass the SSML text
        conn_options=conn_options,
        opts=self._opts,
        client=self._ensure_client(),
    )
In this example,
DEFAULT_API_CONNECT_OPTIONS
is a dictionary with example settings for timeout and retries. You should adjust these settings based on the specific requirements and capabilities of your TTS service. Would you like more detailed guidance on setting up these options? Sources: - https://docs.livekit.io/python/livekit/agents/pipeline/pipeline_agent.html#args - https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.