is it possible to get the interim transcripts from...
# ask-ai
c
is it possible to get the interim transcripts from deepgram when set as a STT provider and not only the final transcript inside livekit?
t
Thanks for reaching out, Clovis. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Yes, it is possible to retrieve interim transcripts from Deepgram when it is set as the STT provider in LiveKit. This feature is enabled by default through the
interimResults
parameter. Here’s how you can configure it: • JavaScript:
Copy code
const deepgramSTT = new STT({
  interimResults: true, // Enable interim transcripts (default: true)
  // other options...
});
Python:
Copy code
deepgram_stt = deepgram.STT(
    interim_results=True,  # Enable interim transcripts (default: True)
    # other options...
)
LiveKit processes Deepgram's responses and emits two types of events: •
INTERIM_TRANSCRIPT
events for preliminary results •
FINAL_TRANSCRIPT
events for confident transcriptions You can handle these events in your application as follows:
Copy code
async for event in stt_stream:
    if event.type == SpeechEventType.INTERIM_TRANSCRIPT:
        # Handle interim transcript
        interim_text = event.alternatives[0].text
    elif event.type == SpeechEventType.FINAL_TRANSCRIPT:
        # Handle final transcript
        final_text = event.alternatives[0].text
This setup allows you to access real-time transcription data effectively. Sources: agents/src/stt/stt.ts | livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py | Working with plugins | LiveKit Docs | Deepgram integration guide | LiveKit Docs | SpeechEventType | LiveKit Agents
c
okay and if I would like to use the first interim result (representing user speech) to interrupt the agent speech how should I do it?
t
Hey there! Let me walk you through a super helpful way to interrupt agent speech using Deepgram's interim results in LiveKit. It's actually pretty straightforward, and I'll break it down step by step. 1. Enable Interim Results First things first, make sure your Deepgram STT is set up to catch those preliminary results. It's like turning on your radar to detect what's coming:
Copy code
from livekit.plugins.deepgram import stt

deepgram_stt = stt.STT(
    model="nova-2-general",
    interim_results=True,  # Enable preliminary results
    # other parameters...
)
2. Interruption Logic The cool part is that LiveKit already has a built-in way to handle interruptions. It's like having a smart assistant that knows exactly when to pause and listen:
Copy code
#interruptIfPossible() {
  if (
    !this.#playingSpeech ||
    !this.#playingSpeech.allowInterruptions ||
    this.#playingSpeech.interrupted
  ) {
    return;
  }

  if (this.#opts.interruptMinWords !== 0) {
    const interimWords = this.#opts.transcription.wordTokenizer.tokenize(
      this.#transcribedInterimText,
    );
    if (interimWords.length < this.#opts.interruptMinWords) {
      return;
    }
  }
  this.#playingSpeech.interrupt();
}
3. Custom Handling Want to get a bit more personalized? You can totally create your own custom logic for handling those interim transcripts. It's like having a tailored conversation flow:
Copy code
@session.on("user_input_transcribed")
def on_transcript(transcript):
    nonlocal transcript_buffer

    if transcript.is_final:
        logger.info(f"Received final transcript: {transcript.transcript}")
        return

    transcript_buffer += " " + transcript.transcript
    transcript_buffer = transcript_buffer.strip()

    # Check conditions for interruption based on interim results
    sentence_count = count_sentences(transcript_buffer)

    if sentence_count >= max_sentences:
        <http://logger.info|logger.info>("Interrupting user...")
        # Trigger interruption logic
        asyncio.create_task(handle_interruption(interruption_ctx))
        transcript_buffer = ""
4. Key Configuration Options
interruptMinWords
: This is like your conversation sensitivity setting - how many words before you're ready to jump in •
allowInterruptions
: Your green light for when it's okay to pause the agent's speech By putting all these pieces together, you'll create a super responsive system that can smoothly interrupt agent speech the moment something meaningful is detected. Pretty neat, right? 😊 Sources: Deepgram integration guide | LiveKit Docs | plugins/deepgram/src/stt.ts | pipeline-tts/short_replies_only.py | livekit-agents/livekit/agents/voice/agent_activity.py
c
and if i have multiple agents that can handoff between each other it works the same way it does for a single one?
t
Let me break down how LiveKit makes agent handoffs super smooth and seamless - it's like passing a conversation baton without missing a beat! 🤝 1. Agent Handoff Coordination • Think of it like a well-choreographed dance - LiveKit pauses the current agent's work and carefully tracks any tasks to prevent any awkward interruptions. • Using the
run_state._agent_handoff()
method, all the important conversation context gets transferred perfectly, so nothing gets lost in translation. • The new agent picks right up where the last one left off, keeping the conversation flowing naturally. 2. Interim Transcript Processing During Handoffs • Even during agent switches, the conversation keeps rolling - interim transcripts keep getting processed through smart event handlers. • Cool features like
on_interim_transcript()
and
on_final_transcript()
make sure every word and nuance is captured, no matter who's handling the conversation. 3. Interruption Handling During Transitions • The system is super smart about managing interruptions, checking things like word count and speech duration during agent handoffs. • If an interruption happens mid-transition, no worries - partial transcripts are safely preserved in the chat context. 4. Real-time State Management • Behind the scenes, LiveKit uses activity locks and scheduling magic to ensure super smooth, thread-safe handoffs. • Any blocked tasks get cleaned up properly, so the transition is always clean and professional. Bottom line? LiveKit makes multi-agent conversations feel totally natural and uninterrupted. It's like having a team of conversation experts who can tag in and out without anyone noticing a thing! 🌟 Sources: livekit-agents/livekit/agents/voice/run_result.py | agents/src/pipeline/pipeline_agent.ts