big-controller-60015
07/14/2025, 10:49 AMtall-airline-16756
07/14/2025, 10:49 AMtall-airline-16756
07/14/2025, 10:50 AMfrom livekit.rtc import AudioProcessingModule
apm = AudioProcessingModule(
echo_cancellation=True,
noise_suppression=True, # Enable noise cancellation
high_pass_filter=True,
auto_gain_control=True,
)
Audio frames should be processed in 10 ms chunks using process_stream
for the main input. For echo cancellation scenarios, use process_reverse_stream
for far-end audio and configure stream delay via set_stream_delay_ms()
.
Swift/iOS Example:
let captureOptions = AudioCaptureOptions(
echoCancellation: true,
autoGainControl: true,
noiseSuppression: true, // Enable noise cancellation
highpassFilter: true
)
This enables noise suppression at the audio capture or processing level. LiveKit also supports advanced AI-powered noise cancellation, but APM's built-in option is a strong baseline and—unlike AI models—requires no external integration.
Audio frames must be exactly 10 ms in duration and will be modified in-place. For further customization or implementation in other programming languages, refer to the SDK-specific initialization for noise suppression or let me know which environment you're working with.
Cited sources:
Sources: livekit-rtc/livekit/rtc/_proto/audio_frame_pb2.pyi | livekit-ffi/src/livekit.proto.rs | Sources/LiveKit/Protocols/AudioCustomProcessingDelegate.swift | AI noise cancellation | LiveKit Docsbig-controller-60015
07/14/2025, 10:53 AMtall-airline-16756
07/14/2025, 10:54 AMAudioProcessingModule
in a super friendly way. Think of it like giving your audio a little makeover before it goes out into the world. 😊
Here's how it works in LiveKit Agents:
1. First, set up your apm
when your agent gets ready to roll.
2. When audio comes in (those tiny 10ms chunks), create an AudioFrame
and then give it a quick process with apm.process_stream(audio_frame)
.
3. If you want to tackle echo too, call apm.process_reverse_stream(render_frame)
on your output audio, and don't forget to set the stream delay with apm.set_stream_delay_ms()
.
4. Then just send that polished audio frame on its way - whether that's to a LiveKit track, a file, or wherever it needs to go!
Just think of apm
like a helpful audio stylist, making sure every 10ms frame looks (or sounds!) its best before it goes out. Pro tip: stick to those 10ms frames for everything to work smoothly!
Cited sources:
Sources: livekit-rtc/livekit/rtc/_proto/ffi_pb2.pyi | livekit-agents/livekit/agents/voice/chat_cli.py | stream_audio.pybig-controller-60015
07/14/2025, 10:55 AMtall-airline-16756
07/14/2025, 10:56 AMAudioProcessingModule
(apm
) needs those frames to be exactly 10ms. No worries, though—I've got a solution for you!
LiveKit's got your back with the super handy AudioByteStream
helper class. Think of it like a friendly buffer that takes your audio and reshapes it into perfect 10ms chunks. For 48kHz mono audio, that means collecting 480 samples per frame before sending it to apm.process_stream()
. Pretty neat, right?
Don't stress if your audio isn't coming in nice and neat. Just use AudioByteStream.push()
with your incoming data, and it'll help you grab those fixed-size 10ms frames. Got some leftover bits? No problem—`AudioByteStream.flush()` will take care of those too!
The bottom line: Your audio can arrive in all sorts of sizes, but for noise cancellation and other cool effects to work their magic, it needs to be chunked into those precise 10ms frames. Consider it audio tailoring—making sure everything fits just right!
Cited sources:
Sources: livekit-rtc/tests/test_apm.py | livekit-agents/livekit/agents/voice/avatar/_datastream_io.py | Receiving and publishing tracks | LiveKit Docs | stream_audio.pybig-controller-60015
07/14/2025, 11:00 AMtall-airline-16756
07/14/2025, 11:01 AMfrom livekit.rtc import AudioProcessingModule
apm = AudioProcessingModule(
echo_cancellation=True,
noise_suppression=True,
high_pass_filter=True,
auto_gain_control=True
)
async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings=None):
async def processed_audio():
async for frame in audio:
apm.process_stream(frame)
yield frame
return await Agent.default.stt_node(self, processed_audio(), model_settings)
The cool thing is, you don't need to mess with session.input.audio
directly. Instead, just put all your preprocessing magic inside the overridden stt_node
. Trust me, this approach works smoothly and keeps your agent running like a charm within the LiveKit Agents platform.
Oh, and quick pro tip! If you're using LiveKit Cloud and want some easy noise cancellation, you can also pass noise cancellation options with room_input_options
. But for those custom, hands-on preprocessing needs, the node override method is your best friend.
Cited sources:
Sources: basics/echo_transcriber_agent.py | livekit-rtc/livekit/rtc/apm.py | Audio and video | LiveKit Docs | Noise & echo cancellation | LiveKit Docs | Voice pipeline nodes | LiveKit Docs