How do I get word level timestamp transcription? c...
# ask-ai
a
How do I get word level timestamp transcription? currently using
use_tts_aligned_transcript=True
and
tts=cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02", language="en")
but the transcript does not show word level timestamp transcription
t
Thanks for reaching out, Anne. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To enable word-level timestamp transcription, ensure the following steps are followed: 1. Enable TTS-Aligned Transcription: Set
use_tts_aligned_transcript=True
in your configuration. This synchronizes transcription output with speech timing for supported TTS providers. 2. Use a Supported TTS Provider: Currently, only Cartesia and ElevenLabs support word-level transcription timing. For other providers, alignment is applied at the sentence level. 3. Access Timing Information: Implement a
transcription_node
method in your agent to access timing information. The iterator yields
TimedString
objects, which include
start_time
and
end_time
for each word. Here is an example implementation for accessing timing information:
Copy code
async def transcription_node(
    self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings
) -> AsyncGenerator[str | TimedString, None]:
    async for chunk in text:
        if isinstance(chunk, TimedString):
            <http://logger.info|logger.info>(f"TimedString: '{chunk}' ({chunk.start_time} - {chunk.end_time})")
        yield chunk
Ensure that your TTS provider explicitly supports this feature. For unsupported providers, alignment will only be at the sentence level. Let me know if you need further assistance! Sources: Text and transcriptions | LiveKit Docs | ElevenLabs TTS integration guide | LiveKit Docs | Cartesia TTS integration guide | LiveKit Docs | livekit-agents/livekit/agents/tts/tts.py
a
where is TimedString improted from?
t
Hey there! Just a quick heads-up—the
TimedString
class comes from the
<http://livekit.agents.voice.io|livekit.agents.voice.io>
module. You can grab it with this import statement:
Copy code
from <http://livekit.agents.voice.io|livekit.agents.voice.io> import TimedString
If you have any questions or need a hand with anything else, just let me know—I’m here to help!
a
Amazing, this works for the agent words, what about the other participants?
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!