How to normalize the agent’s speech to provide alw...
# ask-ai
b
How to normalize the agent’s speech to provide always the same gain ?
t
Thanks for reaching out, Mustafa. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To normalize an agent's speech and ensure consistent gain in LiveKit, you can use one of the following approaches: 1. Enable Loudness Normalization: Use the
loudness_normalization
parameter with Speechify TTS. This aligns the audio output to specific standards (e.g., Integrated loudness: -14 LUFS, True peak: -2 dBTP, Loudness range: 7 LU). Example:
Copy code
session = AgentSession(
    tts=speechify.TTS(
        model="simba-english",
        voice_id="jack",
        loudness_normalization=True
    )
)
1. Custom Volume Adjustment Processor: Implement a custom volume processor to dynamically adjust the agent's speech volume programmatically. Example:
Copy code
def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
    audio_data = np.frombuffer(frame.data, dtype=np.int16)
    audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
    audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
    processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

    return rtc.AudioFrame(
        data=processed.tobytes(),
        sample_rate=frame.sample_rate,
        num_channels=frame.num_channels,
        samples_per_channel=len(processed) // frame.num_channels,
    )
This can be applied to both the TTS node and real-time audio output node. Sources: Speechify TTS integration guide | LiveKit Docs | Agent speech and audio | LiveKit Docs
b
where to implement _adjust_volume_in_frame function ?
t
Hey there! 👋 Let me break down how the
_adjust_volume_in_frame
function works in a friendly way. Think of this as a cool method within your Agent class that helps manage audio volume smoothly. Imagine you want to control the volume of your audio output - this function is your go-to buddy for that! It takes an audio frame and adjusts its volume based on your specified settings. Pretty neat, right? Here's a peek at how it fits into your Agent class:
Copy code
def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
    audio_data = np.frombuffer(frame.data, dtype=np.int16)
    audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
    audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
    processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)

    return rtc.AudioFrame(
        data=processed.tobytes(),
        sample_rate=frame.sample_rate,
        num_channels=frame.num_channels,
        samples_per_channel=len(processed) // frame.num_channels,
    )
This function gets called by the
_adjust_volume_in_stream
method, which handles audio frames from either the
tts_node
or
realtime_audio_output_node
. It's like a volume control wizard working behind the scenes! 🎧 Sources: Agent speech and audio | LiveKit Docs