How can I preprocess audio frames before STT in AI...
# ask-ai
f
How can I preprocess audio frames before STT in AI agents telephony applications using AudioProcessingModule. Show me a code examples
t
Thanks for reaching out, Muthana. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
You can preprocess audio frames before speech-to-text (STT) in AI agents telephony applications by integrating the AudioProcessingModule in your agent's STT node. This module provides WebRTC audio processing features such as echo cancellation, noise suppression, high-pass filtering, and auto gain control. Audio frames must be exactly 10 ms in duration for proper processing. Basic implementation example:
Copy code
from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from livekit.rtc import AudioProcessingModule
from typing import AsyncIterable, Optional

class AudioPreprocessingAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="Your agent instructions here",
            # your STT, LLM, TTS configuration
        )
        # Initialize AudioProcessingModule with desired features
        self.apm = AudioProcessingModule(
            echo_cancellation=True,
            noise_suppression=True,
            high_pass_filter=True,
            auto_gain_control=True,
        )
    
    async def stt_node(
        self, 
        audio: AsyncIterable[rtc.AudioFrame], 
        model_settings: ModelSettings
    ) -> Optional[AsyncIterable[stt.SpeechEvent]]:
        async def preprocessed_audio():
            async for frame in audio:
                # Process the audio frame with APM
                self.apm.process_stream(frame)
                yield frame
        # Pass preprocessed audio to default STT node
        return Agent.default.stt_node(self, preprocessed_audio(), model_settings)
Advanced echo cancellation example:
Copy code
async def process_far_end_audio(self, far_end_audio: AsyncIterable[rtc.AudioFrame]):
        async for frame in far_end_audio:
            # Process the rendered (far-end) audio for echo cancellation
            self.apm.process_reverse_stream(frame)
• Ensure you set the stream delay for echo cancellation using
self.apm.set_stream_delay_ms()
. • The STT node can be overridden to inject custom preprocessing logic before handing audio to the speech-to-text system. • Frames must be exactly 10 ms in duration. Integrating the module in this way lets you leverage robust audio processing to maximize transcription quality in your telephony agents. Sources: livekit-rtc/tests/test_apm.py | Voice pipeline nodes | LiveKit Docs
f
what about before the vad?
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
b
getting this error when session closes after implementing the changes you gave us thread '<unnamed>' panicked at libwebrtc\src\native\apm.rs:50:9: assertion
left == right
failed: slice must have 10ms worth of samples left: 12000 right: 240 note: run with
RUST_BACKTRACE=1
environment variable to display a backtrace FFI Panic: 213058 livekit ERROR livekit_ffi:cabi83livekit ffi:cabi - panic while handling request: Any { .. } 2025-07-14 213058,375 - ERROR livekit - livekit_ffi:cabi83livekit ffi:cabi - panic while handling request: Any { .. } {"session_id": "-"} invalid request: panic while handling request Process inference_proc: Traceback (most recent call last): File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_events.py", line 462, in finish_socket_func return ov.getresult() ^^^^^^^^^^^^^^ OSError: [WinError 64] The specified network name is no longer available During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\nabdu\OneDrive\Desktop\Bytes\BytesAI\venv\Lib\site-packages\livekit\agents\utils\aio\duplex_unix.py", line 57, in aclose await self._writer.wait_closed() File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\streams.py", line 364, in wait_closed await self._protocol._get_close_waiter(self) File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\proactor_events.py", line 286, in _loop_reading length = fut.result() ^^^^^^^^^^^^ File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_events.py", line 803, in _poll value = callback(transferred, key, ov) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\windows_events.py", line 466, in finish_socket_func raise ConnectionResetError(*exc.args) ConnectionResetError: [WinError 64] The specified network name is no longer available The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\Users\nabdu\OneDrive\Desktop\Bytes\BytesAI\venv\Lib\site-packages\livekit\agents\ipc\inference_proc_lazy_main.py", line 63, in proc_main client.run() File "C:\Users\nabdu\OneDrive\Desktop\Bytes\BytesAI\venv\Lib\site-packages\livekit\agents\ipc\proc_client.py", line 86, in run loop.run_until_complete(self._task) File "C:\Users\nabdu\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 691, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "C:\Users\nabdu\OneDrive\Desktop\Bytes\BytesAI\venv\Lib\site-packages\livekit\agents\ipc\proc_client.py", line 162, in _monitor_task await self._acch.aclose() File "C:\Users\nabdu\OneDrive\Desktop\Bytes\BytesAI\venv\Lib\site-packages\livekit\agents\utils\aio\duplex_unix.py", line 60, in aclose raise DuplexClosed() from e livekit.agents.utils.aio.duplex_unix.DuplexClosed