How should I override the stt_node to extract the ...
# ask-ai
a
How should I override the stt_node to extract the word level timestamp from my stt (deepgram)
t
Thanks for reaching out, Anne. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to override the
stt_node
and grab word-level timestamps from Deepgram STT in LiveKit, here’s a simple way to do it: 1. Turn on TTS-Aligned Transcripts:
Copy code
session = AgentSession(
    # ... other configurations
    use_tts_aligned_transcript=True,
)
2. Set up the
transcription_node
Method:
Copy code
class MyAgent(Agent):
    async def transcription_node(
        self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings
    ) -> AsyncGenerator[str | TimedString, None]:
        async for chunk in text:
            if isinstance(chunk, TimedString):
                # Here’s where you can access word-level timestamps
                print(f"Word '{chunk.text}' starts at {chunk.start_time} and ends at {chunk.end_time}")
            yield chunk
3. Optionally, Override the `stt_node`:
Copy code
async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> AsyncIterable[stt.SpeechEvent]:
    # Feel free to add your custom processing before STT
    events = Agent.default.stt_node(self, audio, model_settings)

    # Then, process each event to get those word-level timestamps
    async for event in events:
        if event.type == SpeechEventType.FINAL_TRANSCRIPT:
            # Extract word-level info here
            alternatives = event.alternatives
            for alternative in alternatives:
                # Loop through each word and its timing
                for word in alternative.text.split():
                    # Handle each word and its timing as you like
                    pass
        yield event
Just a heads up—make sure
use_tts_aligned_transcript
is enabled for the most accurate timestamp data, and double-check that Deepgram STT is set up correctly in your project. If you hit any bumps, I’m here to help!
a
That works! except that in this implementation each alternative text in event.alternatives is a few words, instead of one single word. I can see in Deepgram playground that deepgram CAN return single words. Is it a setting?
t
Hey there! Just a quick heads-up: Deepgram STT doesn’t have a built-in setting to return single words instead of grouped ones in alternatives. But no worries—you can easily grab individual words by splitting the transcript after you get it. For example:
Copy code
ev.alternatives[0].text.split(' ')
This will break the text into separate words based on spaces. Just keep in mind, punctuation or contractions might need a little extra care. Also, if you’re looking for really precise word-level details, Deepgram’s response actually includes a
words
array with each word and its timing info. That’s super handy if you want to dive deeper! Sources: plugins/deepgram/src/stt.ts