How can i save the audio clip from the turn of the...
# ask-ai
b
How can i save the audio clip from the turn of the user to disk by overriding stt_node?
t
Thanks for reaching out, Petter. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To save audio clips from user turns to disk by overriding the stt_node in LiveKit, here’s a simple way to do it: • Override the
stt_node
Method:
In your custom Agent class, tweak the
stt_node
to collect and save audio frames to disk. Think of it like gathering audio snippets in a buffer as they come in. • Implement Audio Saving Logic: Add a handy method that takes those buffered audio frames and saves them to a file. For example, you could save the audio as a WAV file using Python’s
wave
module. • Trigger Saving on User Turn Completion: Use the
on_user_turn_completed
hook to save the audio right after the user finishes speaking. Don’t forget to clear the buffer afterward so it’s ready for the next turn! This way, you can easily capture and save audio clips while handling user input. If you want me to walk you through some code examples or need a hand with anything else, just give me a shout!
b
please give me feature complete code
t
Here is a feature-complete example of how to save audio clips from user turns to disk by overriding the stt_node method in LiveKit:
Copy code
import wave
from datetime import datetime
from typing import AsyncIterable, Optional
from livekit import Agent, rtc, stt, ModelSettings, ChatContext, ChatMessage

class AudioSavingAgent(Agent):
    def __init__(self):
        super().__init__(instructions="Your agent instructions here")
        self.audio_buffer = []
        self.buffer_size = 0

    async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> Optional[AsyncIterable[stt.SpeechEvent]]:
        async def audio_with_buffering():
            async for frame in audio:
                self.audio_buffer.append(frame)
                self.buffer_size += len(frame.data)
                # Keep the buffer size manageable
                if self.buffer_size > 100000:
                    self.audio_buffer.pop(0)
                yield frame

        events = super().stt_node(audio_with_buffering(), model_settings)
        if events is not None:
            async for event in events:
                yield event

    async def on_user_turn_completed(self, turn_ctx: ChatContext, new_message: ChatMessage) -> None:
        # Save audio after the user speaks
        await self.save_audio_to_disk()
        # Clear the buffer
        self.audio_buffer.clear()
        # Continue with default processing
        await super().on_user_turn_completed(turn_ctx, new_message)

    async def save_audio_to_disk(self):
        if not self.audio_buffer:
            return

        # Generate a unique filename
        filename = f"audio_recording_{datetime.now().strftime('%Y%m%d_%H%M%S')}.wav"

        # Create a WAV file
        with wave.open(filename, 'wb') as wav_file:
            wav_file.setnchannels(1)  # Mono audio
            wav_file.setsampwidth(2)  # 2 bytes per sample
            wav_file.setframerate(48000)  # 48 kHz sample rate

            for frame in self.audio_buffer:
                wav_file.writeframes(frame.data.tobytes())

        print(f"Audio saved to {filename}")
### Key Features: • Audio Buffering: Collects audio frames during user turns. • WAV File Saving: Saves audio in WAV format with a unique filename. • Buffer Management: Ensures the buffer size remains manageable. • Integration: Hooks into the
on_user_turn_completed
method to save audio after each user turn. Feel free to tweak the filename format, buffer size, or audio settings to fit your needs. And if you have any questions or want a hand with anything else, just let me know—I’m here to help!