how to use the apm audio processing module and use noise can LiveKit Community #ask-ai

Join Slack

how to use the apm, audio processing module and us...

# ask-ai

big-controller-60015

07/14/2025, 10:49 AM

how to use the apm, audio processing module and use noise cancellation

tall-airline-16756

07/14/2025, 10:49 AM

Thanks for reaching out, Mohammed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/14/2025, 10:50 AM

You can enable noise cancellation in LiveKit by using the Audio Processing Module (APM), which provides features like noise suppression, echo cancellation, gain control, and high-pass filtering. To enable noise cancellation, initialize the APM with noise suppression enabled. Example: Python SDK

Copy code

from livekit.rtc import AudioProcessingModule

apm = AudioProcessingModule(
    echo_cancellation=True,
    noise_suppression=True,  # Enable noise cancellation
    high_pass_filter=True,
    auto_gain_control=True,
)

Audio frames should be processed in 10 ms chunks using

process_stream

for the main input. For echo cancellation scenarios, use

process_reverse_stream

for far-end audio and configure stream delay via

set_stream_delay_ms()

. Swift/iOS Example:

Copy code

let captureOptions = AudioCaptureOptions(
    echoCancellation: true,
    autoGainControl: true,
    noiseSuppression: true,  // Enable noise cancellation
    highpassFilter: true
)

This enables noise suppression at the audio capture or processing level. LiveKit also supports advanced AI-powered noise cancellation, but APM's built-in option is a strong baseline and—unlike AI models—requires no external integration. Audio frames must be exactly 10 ms in duration and will be modified in-place. For further customization or implementation in other programming languages, refer to the SDK-specific initialization for noise suppression or let me know which environment you're working with. Cited sources: Sources: livekit-rtc/livekit/rtc/_proto/audio_frame_pb2.pyi | livekit-ffi/src/livekit.proto.rs | Sources/LiveKit/Protocols/AudioCustomProcessingDelegate.swift | AI noise cancellation | LiveKit Docs

big-controller-60015

07/14/2025, 10:53 AM

from livekit.rtc import AudioProcessingModule apm = AudioProcessingModule( echo_cancellation=True, noise_suppression=True, # Enable noise cancellation high_pass_filter=True, auto_gain_control=True, ) in livekit agents, after i initialize apm variable here where do i use it?

tall-airline-16756

07/14/2025, 10:54 AM

Hey there! Let me walk you through how to work with the

AudioProcessingModule

in a super friendly way. Think of it like giving your audio a little makeover before it goes out into the world. 😊 Here's how it works in LiveKit Agents: 1. First, set up your

apm

when your agent gets ready to roll. 2. When audio comes in (those tiny 10ms chunks), create an

AudioFrame

and then give it a quick process with

apm.process_stream(audio_frame)

. 3. If you want to tackle echo too, call

apm.process_reverse_stream(render_frame)

on your output audio, and don't forget to set the stream delay with

apm.set_stream_delay_ms()

. 4. Then just send that polished audio frame on its way - whether that's to a LiveKit track, a file, or wherever it needs to go! Just think of

apm

like a helpful audio stylist, making sure every 10ms frame looks (or sounds!) its best before it goes out. Pro tip: stick to those 10ms frames for everything to work smoothly! Cited sources: Sources: livekit-rtc/livekit/rtc/_proto/ffi_pb2.pyi | livekit-agents/livekit/agents/voice/chat_cli.py | stream_audio.py

big-controller-60015

07/14/2025, 10:55 AM

you have mentioned When audio comes in (those tiny 10ms chunks will they always be 10ms or more or less?

tall-airline-16756

07/14/2025, 10:56 AM

Hey there! Let me break down something important about audio frames that might seem a bit technical at first. Incoming audio frames aren't always exactly 10ms long—and that's totally okay! But here's the catch: the

AudioProcessingModule

(

apm

) needs those frames to be exactly 10ms. No worries, though—I've got a solution for you! LiveKit's got your back with the super handy

AudioByteStream

helper class. Think of it like a friendly buffer that takes your audio and reshapes it into perfect 10ms chunks. For 48kHz mono audio, that means collecting 480 samples per frame before sending it to

apm.process_stream()

. Pretty neat, right? Don't stress if your audio isn't coming in nice and neat. Just use

AudioByteStream.push()

with your incoming data, and it'll help you grab those fixed-size 10ms frames. Got some leftover bits? No problem—`AudioByteStream.flush()` will take care of those too! The bottom line: Your audio can arrive in all sorts of sizes, but for noise cancellation and other cool effects to work their magic, it needs to be chunked into those precise 10ms frames. Consider it audio tailoring—making sure everything fits just right! Cited sources: Sources: livekit-rtc/tests/test_apm.py | livekit-agents/livekit/agents/voice/avatar/_datastream_io.py | Receiving and publishing tracks | LiveKit Docs | stream_audio.py

big-controller-60015

07/14/2025, 11:00 AM

how can i use all this in livekit agents, do i override the stt_node or do i set session.input.audio to something else like how do i get around?

tall-airline-16756

07/14/2025, 11:01 AM

Hey there! Let me walk you through a super handy way to add audio preprocessing in LiveKit agents. Think of it like giving your audio a little spa treatment before it hits the speech-to-text system. 😊 Here's a friendly pattern you can totally use:

Copy code

from livekit.rtc import AudioProcessingModule

apm = AudioProcessingModule(
    echo_cancellation=True,
    noise_suppression=True,
    high_pass_filter=True,
    auto_gain_control=True
)

async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings=None):
    async def processed_audio():
        async for frame in audio:
            apm.process_stream(frame)
            yield frame
    return await Agent.default.stt_node(self, processed_audio(), model_settings)

The cool thing is, you don't need to mess with

session.input.audio

directly. Instead, just put all your preprocessing magic inside the overridden
stt_node
. Trust me, this approach works smoothly and keeps your agent running like a charm within the LiveKit Agents platform. Oh, and quick pro tip! If you're using LiveKit Cloud and want some easy noise cancellation, you can also pass noise cancellation options with

room_input_options

Open in Slack

Previous Next