I’m interested in customizing interruption behavio...
# ask-ai
m
I’m interested in customizing interruption behavior such that the model still finishes the responses from a previous message if certain conditions are met (like if the transcription was generated but the voice just started within a second or so). I want to still generate a response to the original interrupted input and also generate a response to the new input that was the interruption, and queue both up in TTS. Is this possible?
t
Thanks for reaching out, Alexander. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! Just a heads-up—LiveKit doesn’t natively support queuing both the original interrupted response and the new response in Text-to-Speech (TTS). So, when an interruption happens, the current speech stops, and the interrupted message gets cut off without a built-in way to finish or queue it up. But don’t worry! You can create your own workaround by: • Turning off interruptions for certain messages by setting
allow_interruptions=False
on the
SpeechHandle
you get from
session.say()
or
session.generate_reply()
. • Using a
SpeechHandle
to spot interruptions and grab the cut-off text. From there, you can re-synthesize and queue up that truncated bit along with the new response by managing the interrupted messages’ state and text yourself. If you’d like, I’m happy to walk you through the details or help with implementation—just let me know! Sources: agents/src/pipeline/agent_output.ts | Agent speech and audio | LiveKit Docs | livekit-agents/livekit/agents/voice/agent_activity.py
m
Yes please do! Thanks Mr. Or Ms. AI
t
Hey there! If you want to set up custom interruption behavior that queues both the original interrupted response and the new one in TTS, here’s a friendly step-by-step to guide you through: 1. Disable Interruptions for Specific Messages: When you’re generating a response, just set
allow_interruptions=False
on the
SpeechHandle
you get from
session.say()
or
session.generate_reply()
. This way, the message will finish without getting cut off. 2. Capture Interrupted Text: Keep an eye on the
SpeechHandle
to spot when an interruption happens. You can grab the truncated text from the
forwarded_text
or
synchronized_transcript
properties. 3. Re-synthesize and Queue Responses: Once you have the interrupted text, create a new
SpeechHandle
to re-synthesize that part. Then, generate a response for the new input and queue both responses in the order you want. You’ll handle this queuing logic in your app’s higher-level code. 4. Manage State and Playback: Set up a way to track interrupted messages so both the original and new responses play back one after the other. This might mean extending the priority queue of speech tasks (
self._speech_q
) inside the
AgentActivity
class. If any of this feels tricky or you want me to walk you through any part, just let me know—I’m here to help!
m
Can you walk me through how I handle voice / speechHandle inputs or perhaps point me to the docs? Again I basically want to customize the interruption handling such that I can control if the agent completes a previous response or not.
t
Handling voice inputs and customizing how interruptions work can feel a bit tricky, but don’t worry—I’m here to help! Let’s walk through it together: 1. Creating a SpeechHandle: Start by using
session.say()
or
session.generate_reply()
to create a
SpeechHandle
. This gives you handy control over speech generation and how interruptions behave. 2. Customizing Interruptions: If you want to prevent interruptions for certain messages, just set
allow_interruptions=False
on your
SpeechHandle
. You can also tweak session-level settings like
min_interruption_duration
and
min_interruption_words
to decide when interruptions should be allowed. 3. Handling Interrupted Speech: The
SpeechHandle
helps you spot interruptions and grab any cut-off text using
forwarded_text
or
synchronized_transcript
. From there, you can re-synthesize and queue up the part that got interrupted. 4. Managing Playback: Want to make sure the speech finishes playing before moving on? Use
wait_for_playout()
. Plus, you can add callbacks with
add_done_callback()
to handle what happens after completion or if an interruption occurs. 5. Advanced Controls: For more fine-tuned speech control, try
RunContext.disallow_interruptions()
to block interruptions during specific steps. Also, turning on
preemptive_generation=True
in
AgentSession
can help speed things up by generating responses ahead of time. If you want me to dive deeper or show some examples on how to set this up, just let me know—I’m happy to help! Sources: livekit-agents/livekit/agents/voice/agent_activity.py | Agent speech and audio | LiveKit Docs | VoicePipelineAgent | LiveKit Docs
m
Examples plz
t
Here are some examples to help you implement custom interruption behavior: 1. Preventing Interruptions for Specific Messages
Copy code
speech_handle = session.say("This message cannot be interrupted.", allow_interruptions=False)
await speech_handle.wait_for_playout()
This makes sure your message finishes without any interruptions. 2. Capturing Interrupted Text
Copy code
speech_handle = session.say("This message might be interrupted.")
try:
    await speech_handle.wait_for_playout()
except InterruptedError:
    truncated_text = speech_handle.forwarded_text
    print(f"Interrupted text: {truncated_text}")
Here, you can catch the part of the message that got cut off and use it again if you want. 3. Re-synthesizing and Queuing Responses
Copy code
# Capture interrupted text
speech_handle = session.say("This is the original message.")
try:
    await speech_handle.wait_for_playout()
except InterruptedError:
    truncated_text = speech_handle.forwarded_text
    # Re-synthesize the interrupted portion
    await session.say(f"Resuming: {truncated_text}")

# Generate and queue a response for the new input
await session.say("This is the response to the new input.")
This example shows how you can smoothly handle both the interrupted message and the new response that comes in. 4. Adjusting Interruption Sensitivity
Copy code
session = AgentSession(
    min_interruption_duration=0.5,  # Minimum 0.5 seconds of speech to interrupt
    min_interruption_words=3        # Minimum 3 words to trigger interruption
)
These settings help make sure interruptions only happen when it really makes sense. If you’d like more examples or have any questions, just let me know—I’m here to help!
m
So do I need to replace the STT node and process audio frames myself? how does it work with your implementation?
t
Hey there! LiveKit’s setup includes a built-in Speech-to-Text (STT) node that handles audio frames to detect voice activity and transcriptions. The good news is, you don’t have to replace this STT node unless your project calls for a totally different way of transcribing or processing audio. If you’re looking to tweak how interruptions are handled, you can actually stick with the current STT node and just fine-tune some session-level settings like
min_interruption_duration
and
min_interruption_words
. These help decide when interruptions get triggered. But if you want to get hands-on and process audio frames yourself, you can create your own custom STT node or connect with an external STT service. This means capturing audio frames, turning them into transcriptions, and then feeding that info back into LiveKit. If you’d like, I’m happy to walk you through setting up a custom STT node or share more about working with what’s already there—just let me know!
m
So am I listening for interruptions basically and customizing the behavior?
t
Absolutely, you’re basically tuning in for interruptions and tailoring how things work to fit what you need. Here’s a quick rundown of how it all comes together: 1. Detecting Interruptions: The LiveKit framework jumps in when certain conditions happen, like when someone talks for a set amount of time or says a certain number of words. You can tweak these settings with options like
min_interruption_duration
and
min_interruption_words
to make it just right for your session. 2. Handling Interrupted Speech: Once an interruption pops up, you can grab the cut-off text from the
SpeechHandle
(think
forwarded_text
or
synchronized_transcript
). This gives you the flexibility to decide what to do next with that interrupted bit. 3. Customizing Behavior: From here, you might choose to re-synthesize the interrupted part, line it up for playback, or even skip it—whatever fits your flow. For example, if the interruption was quick, you might finish the original response, or if the new input’s more important, you could focus on that instead. If you want me to walk you through it more or share some examples, just let me know—I’m here to help!
m
how do I listen for the interruption event?
t
Hey there! If you want to catch interruption events in LiveKit, here are some handy ways to do it: 1. Playback Finished Event: Whenever an interruption happens, the
PlaybackFinishedEvent
fires with an
interrupted
flag set to
True
. Here’s a quick example:
Copy code
def on_playback_finished(event):
    if event.interrupted:
        print("Playback was interrupted by the user.")

session.output.audio.on("playback_finished", on_playback_finished)
2. Agent State Changes: You can keep an eye on state changes (like switching from
speaking
to
listening
) by listening to the
agent_state_changed
event:
session.on("agent_state_changed", lambda ev: print(f"Agent state: {ev.new_state}"))
3. False Interruption Handling: Sometimes an interruption might happen but no new input follows — in that case, the
agent_false_interruption
event gets triggered. Here’s how you can handle it:
Copy code
@session.on("agent_false_interruption")
def on_false_interrupt(ev):
    print("False interruption detected. Resuming original message.")
    session.generate_reply(instructions=ev.extra_instructions)
4. Manual Interruptions: And if you want to stop the agent’s current speech yourself, just call
session.interrupt()
. If anything’s unclear or you’d like some more examples, just let me know — I’m happy to help! Sources: livekit-agents/livekit/agents/voice/agent_activity.py | Turn detection and interruptions | LiveKit Docs | Bringing AI avatars to voice agents