Audio from older phones is too quiet coming in from twilio i LiveKit Community #ask-ai

Audio from older phones is too quiet coming in fro...

few-soccer-7112

06/26/2025, 2:10 AM

Audio from older phones is too quiet coming in from twilio, is there a way to boost audio frames before sending to STT? I'm hosting a VoicePipeline agent

tall-airline-16756

06/26/2025, 2:10 AM

Thanks for reaching out, Muthana. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/26/2025, 2:10 AM

To address the issue of quiet audio from older phones in your VoicePipeline agent, you can boost the audio frames before sending them to the Speech-to-Text (STT) service by overriding the

stt_node

in your agent. Here's an example of how you can implement custom audio preprocessing:

Copy code

from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
    async def filtered_audio():
        async for frame in audio:
            # insert custom audio preprocessing here
            # This is where you would boost the audio
            yield frame
    async for event in Agent.default.stt_node(self, filtered_audio(), model_settings):
        # insert custom text postprocessing here
        yield event

In the

filtered_audio()

function, you can modify each audio frame to boost its volume before yielding it to the STT node. This allows you to process the audio frames while maintaining the streaming nature of the pipeline. Let me know if you need further assistance! Sources: Voice pipeline nodes | LiveKit Docs

few-soccer-7112

06/26/2025, 2:12 AM

Show me an examle of boosting audio by a set amount

tall-airline-16756

06/26/2025, 2:13 AM

Here is an example of how you can boost audio frames by a set amount in your VoicePipeline agent:

Copy code

class Assistant(Agent):
    def __init__(self) -> None:
        self.volume: int = 50  # Set your desired volume boost level (0-100%)
        super().__init__(
            instructions=f"You are a helpful voice AI assistant. Your starting volume level is {self.volume}."
        )

    async def _adjust_volume_in_stream(
        self, audio: AsyncIterable[rtc.AudioFrame]
    ) -> AsyncIterable[rtc.AudioFrame]:
        stream: utils.audio.AudioByteStream | None = None
        async for frame in audio:
            if stream is None:
                stream = utils.audio.AudioByteStream(
                    sample_rate=frame.sample_rate,
                    num_channels=frame.num_channels,
                    samples_per_channel=frame.sample_rate // 10,  # 100ms
                )
            for f in stream.push(frame.data):
                yield self._adjust_volume_in_frame(f)

        if stream is not None:
            for f in stream.flush():
                yield self._adjust_volume_in_frame(f)

    def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
        audio_data = np.frombuffer(frame.data, dtype=np.int16)
        audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
        audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
        processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

        return rtc.AudioFrame(
            data=processed.tobytes(),
            sample_rate=frame.sample_rate,
            num_channels=frame.num_channels,
            samples_per_channel=len(processed) // frame.num_channels,
        )

Hey there! Let me walk you through this cool audio processing code. It's actually pretty neat: 1. Collects audio frames in a stream 2. Converts the audio data to a format we can easily work with 3. Applies a volume boost (like turning up the volume on your favorite song) 4. Converts the audio back to its original format 5. Returns the modified audio frame The best part? You can totally customize the volume boost by tweaking the

self.volume

parameter. Want to make things a bit louder or keep it more subtle? Just adjust that number between 0-100%. Have any questions about how this works? I'm always happy to chat and help you understand the details! Feel free to ask, and I'll break it down in a way that makes sense to you. Sources: Agent speech and audio | LiveKit Docs

Open in Slack

Previous Next