few-soccer-7112
06/26/2025, 2:10 AMtall-airline-16756
06/26/2025, 2:10 AMtall-airline-16756
06/26/2025, 2:10 AMstt_node
in your agent. Here's an example of how you can implement custom audio preprocessing:
from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional
async def stt_node(
self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def filtered_audio():
async for frame in audio:
# insert custom audio preprocessing here
# This is where you would boost the audio
yield frame
async for event in Agent.default.stt_node(self, filtered_audio(), model_settings):
# insert custom text postprocessing here
yield event
In the filtered_audio()
function, you can modify each audio frame to boost its volume before yielding it to the STT node. This allows you to process the audio frames while maintaining the streaming nature of the pipeline. Let me know if you need further assistance!
Sources: Voice pipeline nodes | LiveKit Docsfew-soccer-7112
06/26/2025, 2:12 AMtall-airline-16756
06/26/2025, 2:13 AMclass Assistant(Agent):
def __init__(self) -> None:
self.volume: int = 50 # Set your desired volume boost level (0-100%)
super().__init__(
instructions=f"You are a helpful voice AI assistant. Your starting volume level is {self.volume}."
)
async def _adjust_volume_in_stream(
self, audio: AsyncIterable[rtc.AudioFrame]
) -> AsyncIterable[rtc.AudioFrame]:
stream: utils.audio.AudioByteStream | None = None
async for frame in audio:
if stream is None:
stream = utils.audio.AudioByteStream(
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
samples_per_channel=frame.sample_rate // 10, # 100ms
)
for f in stream.push(frame.data):
yield self._adjust_volume_in_frame(f)
if stream is not None:
for f in stream.flush():
yield self._adjust_volume_in_frame(f)
def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
audio_data = np.frombuffer(frame.data, dtype=np.int16)
audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)
return rtc.AudioFrame(
data=processed.tobytes(),
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
samples_per_channel=len(processed) // frame.num_channels,
)
Hey there! Let me walk you through this cool audio processing code. It's actually pretty neat: 1. Collects audio frames in a stream 2. Converts the audio data to a format we can easily work with 3. Applies a volume boost (like turning up the volume on your favorite song) 4. Converts the audio back to its original format 5. Returns the modified audio frame The best part? You can totally customize the volume boost by tweaking the self.volume
parameter. Want to make things a bit louder or keep it more subtle? Just adjust that number between 0-100%. Have any questions about how this works? I'm always happy to chat and help you understand the details! Feel free to ask, and I'll break it down in a way that makes sense to you. Sources: Agent speech and audio | LiveKit Docs