jolly-vr-35720
08/05/2025, 5:55 AMtall-airline-16756
08/05/2025, 5:55 AMtall-airline-16756
08/05/2025, 5:56 AMspeech_handle.interrupted) to identify when the user interrupts the TTS output. Cancel ongoing tasks and clear the audio buffer when an interruption occurs.
2. Retrieve Synchronized Transcript: After detecting an interruption, retrieve the synchronized transcript of what was actually spoken. If this is unavailable, fallback to the last known text output.
3. Update LLM Context: Update the LLM's chat context with the text that was heard by the user. Mark this update as interrupted to maintain accurate context.
Here’s an example of how this can be implemented:
if speech_handle.interrupted:
await utils.aio.cancel_and_wait(*tasks)
if audio_output is not None:
audio_output.clear_buffer()
playback_ev = await audio_output.wait_for_playout()
if add_to_chat_ctx:
forwarded_text = text_out.text if text_out else ""
if speech_handle.interrupted and audio_output is not None:
playback_ev = await audio_output.wait_for_playout()
if playback_ev.synchronized_transcript is not None:
forwarded_text = playback_ev.synchronized_transcript
if forwarded_text:
new_message = self._agent._chat_ctx.add_message(
role="assistant",
content=forwarded_text,
interrupted=speech_handle.interrupted
)
speech_handle._chat_items.append(new_message)
self._session._conversation_item_added(new_message)
This ensures that the LLM context remains aligned with what the user actually heard, even in cases of interruptions.jolly-vr-35720
08/05/2025, 11:58 AMtall-airline-16756
08/05/2025, 11:58 AMTextAudioSynchronizer) that coordinates the text and audio playback. It sends out events like textUpdated so you can easily see which part of the text has been spoken.
2. Track Playback State: Keep an eye on playback completion using RPC signals or APIs such as SpeechHandle. This way, you make sure to update the LLM context only after the audio has actually finished playing.
3. Implementation Example:
// Initialize the synchronizer
const synchronizer = new TextAudioSynchronizer(defaultTextSyncOptions);
// Process LLM text and audio output
synchronizer.on('textUpdated', (text) => {
console.log('Spoken text:', text.text);
});
This method helps you keep the LLM context in sync with what’s really being spoken, even if the TTS takes a bit longer to get through the content. Hope this makes things clearer and easier for you!
Sources: agents/src/pipeline/pipeline_agent.ts | Bringing AI avatars to voice agents | Bringing AI avatars to voice agents | livekit-agents/livekit/agents/voice/agent_activity.py