How should I override the stt node to extract the word level LiveKit Community #ask-ai

Join Slack

How should I override the stt_node to extract the ...

# ask-ai

acceptable-dream-85180

08/09/2025, 11:28 AM

How should I override the stt_node to extract the word level timestamp from my stt (deepgram)

tall-airline-16756

08/09/2025, 11:28 AM

Thanks for reaching out, Anne. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/09/2025, 11:29 AM

Hey there! If you want to override the

stt_node

and grab word-level timestamps from Deepgram STT in LiveKit, here’s a simple way to do it: 1. Turn on TTS-Aligned Transcripts:

Copy code

session = AgentSession(
    # ... other configurations
    use_tts_aligned_transcript=True,
)

2. Set up the
transcription_node
Method:

Copy code

class MyAgent(Agent):
    async def transcription_node(
        self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings
    ) -> AsyncGenerator[str | TimedString, None]:
        async for chunk in text:
            if isinstance(chunk, TimedString):
                # Here’s where you can access word-level timestamps
                print(f"Word '{chunk.text}' starts at {chunk.start_time} and ends at {chunk.end_time}")
            yield chunk

3. Optionally, Override the `stt_node`:

Copy code

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> AsyncIterable[stt.SpeechEvent]:
    # Feel free to add your custom processing before STT
    events = Agent.default.stt_node(self, audio, model_settings)

    # Then, process each event to get those word-level timestamps
    async for event in events:
        if event.type == SpeechEventType.FINAL_TRANSCRIPT:
            # Extract word-level info here
            alternatives = event.alternatives
            for alternative in alternatives:
                # Loop through each word and its timing
                for word in alternative.text.split():
                    # Handle each word and its timing as you like
                    pass
        yield event

Just a heads up—make sure

use_tts_aligned_transcript

is enabled for the most accurate timestamp data, and double-check that Deepgram STT is set up correctly in your project. If you hit any bumps, I’m here to help!

acceptable-dream-85180

08/11/2025, 11:41 AM

That works! except that in this implementation each alternative text in event.alternatives is a few words, instead of one single word. I can see in Deepgram playground that deepgram CAN return single words. Is it a setting?

tall-airline-16756

08/11/2025, 11:42 AM

Hey there! Just a quick heads-up: Deepgram STT doesn’t have a built-in setting to return single words instead of grouped ones in alternatives. But no worries—you can easily grab individual words by splitting the transcript after you get it. For example:

Copy code

ev.alternatives[0].text.split(' ')

This will break the text into separate words based on spaces. Just keep in mind, punctuation or contractions might need a little extra care. Also, if you’re looking for really precise word-level details, Deepgram’s response actually includes a

words

array with each word and its timing info. That’s super handy if you want to dive deeper! Sources: plugins/deepgram/src/stt.ts

Open in Slack

Previous Next