Please confirm whether the `audio` iterable passed into `stt LiveKit Community #ask-ai

Please confirm whether the `audio` iterable passed...

few-soccer-7112

08/16/2025, 12:50 AM

Please confirm whether the

audio

iterable passed into

stt_node(self, audio, model_settings)

is already post-BVCTelephony and pre-AudioProcessingModule. If not, show the canonical way to access frames at that point (post-BVC/pre-APM) in Python agents.

tall-airline-16756

08/16/2025, 12:50 AM

Thanks for reaching out, Muthana. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/16/2025, 12:51 AM

The audio passed to stt_node(self, audio, model_settings) isn’t automatically post-processed by BVCTelephony or AudioProcessingModule. So, the best way to work with audio frames at this stage is to handle the audio parameter directly inside the stt_node method. Here’s a quick example:

Copy code

async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> AsyncIterable[stt.SpeechEvent | str]:
    # This is where you can access audio frames before STT processing
    async for frame in audio:
        # Feel free to process each frame as needed
        yield frame

If you want to add your own preprocessing, you can simply override the stt_node method in your agent class. Check out this example:

Copy code

class CustomAgent(Agent):
    async def stt_node(self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings) -> AsyncIterable[stt.SpeechEvent]:
        async def processed_audio():
            async for frame in audio:
                # Here’s where you can apply custom processing to each audio frame
                processed_frame = self._process_frame(frame)
                yield processed_frame

        return await super().stt_node(processed_audio(), model_settings)

    def _process_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
        # Put your custom frame processing logic here
        # Return either the original or the modified frame
        return frame

This way, you get to add any preprocessing steps you need while still making the most of the STT capabilities LiveKit offers. If you have any questions or want to chat more about it, I’m here to help! Sources: Voice pipeline nodes | LiveKit Docs | examples/voice_agents/speedup_output_audio.py

Open in Slack

Previous Next