How to normalize the agent s speech to provide always the sa LiveKit Community #ask-ai

Join Slack

How to normalize the agent’s speech to provide alw...

# ask-ai

boundless-painting-47811

07/05/2025, 1:20 PM

How to normalize the agent’s speech to provide always the same gain ?

tall-airline-16756

07/05/2025, 1:20 PM

Thanks for reaching out, Mustafa. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/05/2025, 1:20 PM

To normalize an agent's speech and ensure consistent gain in LiveKit, you can use one of the following approaches: 1. Enable Loudness Normalization: Use the

loudness_normalization

parameter with Speechify TTS. This aligns the audio output to specific standards (e.g., Integrated loudness: -14 LUFS, True peak: -2 dBTP, Loudness range: 7 LU). Example:

Copy code

session = AgentSession(
    tts=speechify.TTS(
        model="simba-english",
        voice_id="jack",
        loudness_normalization=True
    )
)

1. Custom Volume Adjustment Processor: Implement a custom volume processor to dynamically adjust the agent's speech volume programmatically. Example:

Copy code

def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
    audio_data = np.frombuffer(frame.data, dtype=np.int16)
    audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
    audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
    processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

    return rtc.AudioFrame(
        data=processed.tobytes(),
        sample_rate=frame.sample_rate,
        num_channels=frame.num_channels,
        samples_per_channel=len(processed) // frame.num_channels,
    )

This can be applied to both the TTS node and real-time audio output node. Sources: Speechify TTS integration guide | LiveKit Docs | Agent speech and audio | LiveKit Docs

boundless-painting-47811

07/05/2025, 1:21 PM

where to implement _adjust_volume_in_frame function ?

tall-airline-16756

07/05/2025, 1:21 PM

Hey there! 👋 Let me break down how the

_adjust_volume_in_frame

function works in a friendly way. Think of this as a cool method within your Agent class that helps manage audio volume smoothly. Imagine you want to control the volume of your audio output - this function is your go-to buddy for that! It takes an audio frame and adjusts its volume based on your specified settings. Pretty neat, right? Here's a peek at how it fits into your Agent class:

Copy code

def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
    audio_data = np.frombuffer(frame.data, dtype=np.int16)
    audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
    audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
    processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

    return rtc.AudioFrame(
        data=processed.tobytes(),
        sample_rate=frame.sample_rate,
        num_channels=frame.num_channels,
        samples_per_channel=len(processed) // frame.num_channels,
    )

This function gets called by the

_adjust_volume_in_stream

method, which handles audio frames from either the

tts_node

realtime_audio_output_node

. It's like a volume control wizard working behind the scenes! 🎧✨ Sources: Agent speech and audio | LiveKit Docs

Open in Slack

Previous Next