few-soccer-7112
08/16/2025, 12:50 AMaudio
iterable passed into stt_node(self, audio, model_settings)
is already post-BVCTelephony and pre-AudioProcessingModule. If not, show the canonical way to access frames at that point (post-BVC/pre-APM) in Python agents.tall-airline-16756
08/16/2025, 12:50 AMtall-airline-16756
08/16/2025, 12:51 AMasync def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> AsyncIterable[stt.SpeechEvent | str]:
# This is where you can access audio frames before STT processing
async for frame in audio:
# Feel free to process each frame as needed
yield frame
If you want to add your own preprocessing, you can simply override the stt_node method in your agent class. Check out this example:
class CustomAgent(Agent):
async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> AsyncIterable[stt.SpeechEvent]:
async def processed_audio():
async for frame in audio:
# Here’s where you can apply custom processing to each audio frame
processed_frame = self._process_frame(frame)
yield processed_frame
return await super().stt_node(processed_audio(), model_settings)
def _process_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
# Put your custom frame processing logic here
# Return either the original or the modified frame
return frame
This way, you get to add any preprocessing steps you need while still making the most of the STT capabilities LiveKit offers. If you have any questions or want to chat more about it, I’m here to help! Sources: Voice pipeline nodes | LiveKit Docs | examples/voice_agents/speedup_output_audio.py