I m using LiveKit AgentSession and need to access interim tr LiveKit Community #ask-ai

I'm using LiveKit AgentSession and need to access ...

chilly-nest-87883

07/27/2025, 11:01 PM

I'm using LiveKit AgentSession and need to access interim transcripts from Deepgram STT for custom interruption logic. I have configured Deepgram with interim_results=True and can see final transcripts in conversation events, but I cannot access interim transcripts. I've tried: 1. session.on_interim_transcript = handler - handler never gets called 2. session.stt.stream() - session.stt is not accessible 3. Accessing agent STT directly - agent.stt is not available from session context My setup: - AgentSession with Deepgram STT (interim_results=True, smart_format=True) - Turn detection: 'stt' mode - Goal: Interrupt agent speech based on interim transcript word detection What is the correct architecture/API to access interim transcripts within an AgentSession? Do I need to modify the Agent class itself, use a different session configuration, or is there a specific event/hook I'm missing?

tall-airline-16756

07/27/2025, 11:01 PM

Thanks for reaching out, Clovis. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/27/2025, 11:01 PM

To access interim transcripts from Deepgram STT in your LiveKit AgentSession and implement custom interruption logic, follow these steps: 1. Enable Interim Results Ensure that interim results are enabled in your Deepgram STT configuration:

Copy code

from livekit.plugins.deepgram import stt

deepgram_stt = stt.STT(
    model="nova-2-general",
    interim_results=True,  # Enable interim transcripts
    smart_format=True,
    punctuate=True,
    language="en-US",
)

2. Process STT Events for Custom Interruption Logic Access interim transcripts by processing the STT event stream. The Deepgram implementation emits

INTERIM_TRANSCRIPT

events when

is_final_transcript

is False:

Copy code

from livekit import agents, rtc
from livekit.plugins import deepgram
from livekit.agents.stt import SpeechEventType, SpeechEvent
from typing import AsyncIterable

async def process_track(ctx: agents.JobContext, track: rtc.Track):
    stt = deepgram.STT(interim_results=True)
    stt_stream = stt.stream()
    audio_stream = rtc.AudioStream(track)

    ctx.create_task(process_text_from_speech(stt_stream))
    async for audio_event in audio_stream:
        stt_stream.push_frame(audio_event.frame)

    stt_stream.end_input()

async def process_text_from_speech(stream: AsyncIterable[SpeechEvent]):
    async for event in stream:
        if event.type == SpeechEventType.INTERIM_TRANSCRIPT:
            interim_text = event.alternatives[0].text
            # Implement your custom interruption logic here
            if should_interrupt(interim_text):
                trigger_interruption()
        elif event.type == SpeechEventType.FINAL_TRANSCRIPT:
            final_text = event.alternatives[0].text
            # Handle final transcript

3. Use Agent Activity Handlers The agent activity system provides access to the current transcript, including interim results:

Copy code

# In your agent activity handler
def on_interim_transcript(self, ev: stt.SpeechEvent) -> None:
    # Access the interim transcript text
    interim_text = ev.alternatives[0].text

    # Implement custom interruption logic
    if self.should_interrupt_based_on_interim(interim_text):
        # Trigger interruption
        if (self._current_speech is not None 
            and not self._current_speech.interrupted 
            and self._current_speech.allow_interruptions):
            self._current_speech.interrupt()

4. Access Current Transcript Property The audio recognition system maintains a

current_transcript

property that combines final and interim transcripts:

Copy code

# Access the current transcript including interim results
current_text = self._audio_recognition.current_transcript

# Use this for interruption decisions
if self.should_interrupt(current_text):
    # Implement interruption logic
    pass

These steps should allow you to effectively access and utilize interim transcripts for your custom interruption logic. Sources: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py | Deepgram integration guide | LiveKit Docs

2 Views

Open in Slack

Previous Next