Audio from older phones is too quiet coming in fro...
# ask-ai
f
Audio from older phones is too quiet coming in from twilio, is there a way to boost audio frames before sending to STT? I'm hosting a VoicePipeline agent
t
Thanks for reaching out, Muthana. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To address the issue of quiet audio from older phones in your VoicePipeline agent, you can boost the audio frames before sending them to the Speech-to-Text (STT) service by overriding the
stt_node
in your agent. Here's an example of how you can implement custom audio preprocessing:
Copy code
from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
    async def filtered_audio():
        async for frame in audio:
            # insert custom audio preprocessing here
            # This is where you would boost the audio
            yield frame
    async for event in Agent.default.stt_node(self, filtered_audio(), model_settings):
        # insert custom text postprocessing here
        yield event
In the
filtered_audio()
function, you can modify each audio frame to boost its volume before yielding it to the STT node. This allows you to process the audio frames while maintaining the streaming nature of the pipeline. Let me know if you need further assistance! Sources: Voice pipeline nodes | LiveKit Docs
f
Show me an examle of boosting audio by a set amount
t
Here is an example of how you can boost audio frames by a set amount in your VoicePipeline agent:
Copy code
class Assistant(Agent):
    def __init__(self) -> None:
        self.volume: int = 50  # Set your desired volume boost level (0-100%)
        super().__init__(
            instructions=f"You are a helpful voice AI assistant. Your starting volume level is {self.volume}."
        )

    async def _adjust_volume_in_stream(
        self, audio: AsyncIterable[rtc.AudioFrame]
    ) -> AsyncIterable[rtc.AudioFrame]:
        stream: utils.audio.AudioByteStream | None = None
        async for frame in audio:
            if stream is None:
                stream = utils.audio.AudioByteStream(
                    sample_rate=frame.sample_rate,
                    num_channels=frame.num_channels,
                    samples_per_channel=frame.sample_rate // 10,  # 100ms
                )
            for f in stream.push(frame.data):
                yield self._adjust_volume_in_frame(f)

        if stream is not None:
            for f in stream.flush():
                yield self._adjust_volume_in_frame(f)

    def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
        audio_data = np.frombuffer(frame.data, dtype=np.int16)
        audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
        audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
        processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

        return rtc.AudioFrame(
            data=processed.tobytes(),
            sample_rate=frame.sample_rate,
            num_channels=frame.num_channels,
            samples_per_channel=len(processed) // frame.num_channels,
        )
Hey there! Let me walk you through this cool audio processing code. It's actually pretty neat: 1. Collects audio frames in a stream 2. Converts the audio data to a format we can easily work with 3. Applies a volume boost (like turning up the volume on your favorite song) 4. Converts the audio back to its original format 5. Returns the modified audio frame The best part? You can totally customize the volume boost by tweaking the
self.volume
parameter. Want to make things a bit louder or keep it more subtle? Just adjust that number between 0-100%. Have any questions about how this works? I'm always happy to chat and help you understand the details! Feel free to ask, and I'll break it down in a way that makes sense to you. Sources: Agent speech and audio | LiveKit Docs