steep-balloon-41261
08/16/2025, 1:27 PMtall-airline-16756
08/16/2025, 1:28 PMtall-airline-16756
08/16/2025, 1:28 PMTranscriptSynchronizer
class in the LiveKit Agents framework is your friend. Here’s a simple way to do it:
• Turn on TTS-Aligned Transcription: Just set use_tts_aligned_transcript=True
in your AgentSession
settings to get word-level sync.
• Handle the transcription_node
Method: This deals with TimedString
objects, which come with start_time
and end_time
for each word.
• Make Use of the TranscriptSynchronizer
Class: It helps sync your text with the audio playback perfectly. Just feed it timed text and audio frames using push_text
and push_audio
.
• Wrap Up Playback: Call mark_playback_finished
to make sure your synchronized transcript is all set once playback ends.
By the way, word-level timestamps work great with ElevenLabs and Cartesia TTS providers. If you want me to walk you through more details or share some examples, just give me a shout!
Sources: examples/other/text-to-speech/sync_tts_transcription.py | ElevenLabs TTS integration guide | LiveKit Docs | Cartesia TTS integration guide | LiveKit Docstall-airline-16756
08/16/2025, 1:29 PMstart_time
and end_time
in TimedString
are tied to the audio playback. They’re measured in seconds and show exactly when the text should be spoken, starting from the beginning of the audio segment. If anything’s unclear or you want me to explain more, just let me know—I’m here to help!
Sources: livekit-agents/livekit/agents/tts/tts.py | Text and transcriptions | LiveKit Docstall-airline-16756
08/16/2025, 1:30 PM