big-controller-60015
07/13/2025, 11:00 PMtall-airline-16756
07/13/2025, 11:01 PMtall-airline-16756
07/13/2025, 11:01 PMstt_node
method in your Agent class and apply your processing logic there. Here’s a streamlined example:
async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def filtered_audio():
async for frame in audio:
# Apply your custom preprocessing here
yield processed_frame
return await super().stt_node(filtered_audio(), model_settings)
Within filtered_audio
, you can use audio processing libraries like numpy
or librosa
to implement filtering, noise reduction, or other transformations. Just ensure the output maintains the sample rate and buffer configuration required by the VAD component (typically 16kHz for Silero VAD).
If you want to integrate built-in preprocessing options (like noise cancellation), you can use RoomInputOptions
with the relevant configuration.
This method ensures that your preprocessing occurs on all audio frames before they're handed off to the VAD or STT pipeline.
Sources: livekit-agents/livekit/agents/voice/audio_recognition.py | Voice pipeline nodes | LiveKit Docs | realtime/openai-realtime-pitch-shift.py | agent.pybig-controller-60015
07/14/2025, 8:23 AMtall-airline-16756
07/14/2025, 8:24 AMstt_node
.
Here's the cool part: when you step in at this point, you're catching the audio stream right at the beginning—before it even reaches VAD or STT. This means your preprocessing will shape the audio before it hits those components. Both the docs and source code back this up, confirming that processing in stt_node
happens "before it reaches both VAD and STT," so you can totally apply your custom logic exactly where you want. Pretty neat, right?
Sources: basics/echo_transcriber_agent.py | livekit-agents/livekit/agents/stt/stream_adapter.py | Voice pipeline nodes | LiveKit Docs