How do I get word level timestamp transcription currently us LiveKit Community #ask-ai

How do I get word level timestamp transcription? c...

acceptable-dream-85180

08/07/2025, 3:06 PM

How do I get word level timestamp transcription? currently using

use_tts_aligned_transcript=True

and

tts=cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02", language="en")

but the transcript does not show word level timestamp transcription

tall-airline-16756

08/07/2025, 3:06 PM

Thanks for reaching out, Anne. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/07/2025, 3:06 PM

To enable word-level timestamp transcription, ensure the following steps are followed: 1. Enable TTS-Aligned Transcription: Set

use_tts_aligned_transcript=True

in your configuration. This synchronizes transcription output with speech timing for supported TTS providers. 2. Use a Supported TTS Provider: Currently, only Cartesia and ElevenLabs support word-level transcription timing. For other providers, alignment is applied at the sentence level. 3. Access Timing Information: Implement a

transcription_node

method in your agent to access timing information. The iterator yields

TimedString

objects, which include

start_time

and

end_time

for each word. Here is an example implementation for accessing timing information:

Copy code

async def transcription_node(
    self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings
) -> AsyncGenerator[str | TimedString, None]:
    async for chunk in text:
        if isinstance(chunk, TimedString):
            <http://logger.info|logger.info>(f"TimedString: '{chunk}' ({chunk.start_time} - {chunk.end_time})")
        yield chunk

Ensure that your TTS provider explicitly supports this feature. For unsupported providers, alignment will only be at the sentence level. Let me know if you need further assistance! Sources: Text and transcriptions | LiveKit Docs | ElevenLabs TTS integration guide | LiveKit Docs | Cartesia TTS integration guide | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py

acceptable-dream-85180

08/07/2025, 3:09 PM

where is TimedString improted from?

tall-airline-16756

08/07/2025, 3:09 PM

Hey there! Just a quick heads-up—the

TimedString

class comes from the

<http://livekit.agents.voice.io|livekit.agents.voice.io>

module. You can grab it with this import statement:

Copy code

from <http://livekit.agents.voice.io|livekit.agents.voice.io> import TimedString

If you have any questions or need a hand with anything else, just let me know—I’m here to help!

acceptable-dream-85180

08/07/2025, 3:16 PM

Amazing, this works for the agent words, what about the other participants?

tall-airline-16756

08/07/2025, 3:17 PM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

Open in Slack

Previous Next