How can stop processing audio already generated in...
# ask-ai
s
How can stop processing audio already generated in datstream if user start speaking?
t
Thanks for reaching out, Ankit. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I’m glad you asked about stopping audio processing when a user starts speaking! With LiveKit, you have some great options to manage interruptions smoothly. Here are the main approaches you can try: • Manual Control: You can use
session.interrupt()
to pause the current agent speech and
session.input.set_audio_enabled(True/False)
to toggle listening on or off. This works really well if you’re aiming for push-to-talk style interactions. • Automatic Interruption: By enabling Voice Activity Detection (VAD) and setting
allowInterruptions
to true, the agent will automatically pause its speech whenever it detects the user speaking. It’s a handy hands-free option! • Server-Side Interruption: For real-time models, interruptions are managed on the server side, which kicks in as soon as the server recognizes that the user has started talking. If you’d like, I’m happy to walk you through the detailed implementation steps for any of these methods—just let me know!
s
This is for Avatar worker and I am using DataStreamAudioOutput to publish audio from avatar to baceknd worker. Now the problem is we don’t know when audio stream should stop passing audio if user stand speaking.
t
To gracefully stop
DataStreamAudioOutput
in an Avatar worker when a user starts speaking, here’s a helpful approach you can try: • Use `session.interrupt()`: This will pause any ongoing agent speech and stop the audio stream being sent through
DataStreamAudioOutput
. It also clears the output buffer, which keeps things tidy. • Detect User Speech: You can use Voice Activity Detection (VAD) or events like
user_started_speaking
to know exactly when the user begins talking and then trigger
session.interrupt()
. • Monitor Interruption: Keep an eye on the
playback_finished
event on the avatar worker to confirm the stream was successfully interrupted. This event includes an
interrupted
flag for your reference. Just a quick note:
DataStreamAudioOutput
doesn’t support manual pause/resume, so using
session.interrupt()
is the best way to handle this. If you’d like, I’m here to help with detailed implementation steps or any other questions you might have!
s
Yes please give me implementation
t
Here’s a helpful example to guide you through stopping
DataStreamAudioOutput
in an Avatar worker when a user begins speaking: 1. Initialize the Avatar Worker
Copy code
from livekit import DataStreamAudioOutput

# Initialize DataStreamAudioOutput
avatar_audio_output = DataStreamAudioOutput(
    room=room,  # Your LiveKit room instance
    destination_identity="avatar_worker",  # Identity of the avatar worker
    sample_rate=16000  # Set the sample rate
)

# Assign the audio output to the session
session.output.audio = avatar_audio_output
2. Detect User Speech Here, you can use Voice Activity Detection (VAD) or an event listener to catch when the user starts speaking:
Copy code
@agent.on("user_started_speaking")
def handle_user_speaking():
    session.interrupt()  # Stop the audio stream immediately
    session.input.set_audio_enabled(True)  # Enable user input
3. Handle Playback Interruption It's good to listen for the
playback_finished
event to confirm the stream was indeed interrupted:
Copy code
@session.output.audio.on("playback_finished")
def on_playback_finished(event):
    if event.interrupted:
        print("Audio stream was interrupted by user speech.")
4. Stop and Resume Audio Stream Once the user stops speaking, you can disable input and process the audio they provided:
Copy code
@agent.on("user_stopped_speaking")
def handle_user_stopped():
    session.input.set_audio_enabled(False)  # Stop listening to user
    session.commit_user_turn()  # Process the user’s input
This approach helps ensure the audio stream to the avatar worker pauses right when the user starts talking and then smoothly resumes once they're done. If anything isn’t clear or you’d like more examples, just let me know—I’m here to help!