boundless-painting-47811
07/05/2025, 1:20 PMtall-airline-16756
07/05/2025, 1:20 PMtall-airline-16756
07/05/2025, 1:20 PMloudness_normalization
parameter with Speechify TTS. This aligns the audio output to specific standards (e.g., Integrated loudness: -14 LUFS, True peak: -2 dBTP, Loudness range: 7 LU). Example:
session = AgentSession(
tts=speechify.TTS(
model="simba-english",
voice_id="jack",
loudness_normalization=True
)
)
1. Custom Volume Adjustment Processor: Implement a custom volume processor to dynamically adjust the agent's speech volume programmatically. Example:
def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
audio_data = np.frombuffer(frame.data, dtype=np.int16)
audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)
return rtc.AudioFrame(
data=processed.tobytes(),
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
samples_per_channel=len(processed) // frame.num_channels,
)
This can be applied to both the TTS node and real-time audio output node.
Sources: Speechify TTS integration guide | LiveKit Docs | Agent speech and audio | LiveKit Docsboundless-painting-47811
07/05/2025, 1:21 PMtall-airline-16756
07/05/2025, 1:21 PM_adjust_volume_in_frame
function works in a friendly way. Think of this as a cool method within your Agent class that helps manage audio volume smoothly. Imagine you want to control the volume of your audio output - this function is your go-to buddy for that! It takes an audio frame and adjusts its volume based on your specified settings. Pretty neat, right? Here's a peek at how it fits into your Agent class:
def _adjust_volume_in_frame(self, frame: rtc.AudioFrame) -> rtc.AudioFrame:
audio_data = np.frombuffer(frame.data, dtype=np.int16)
audio_float = audio_data.astype(np.float32) / np.iinfo(np.int16).max
audio_float = audio_float * max(0, min(self.volume, 100)) / 100.0
processed = (audio_float * np.iinfo(np.int16).max).astype(np.int16)
return rtc.AudioFrame(
data=processed.tobytes(),
sample_rate=frame.sample_rate,
num_channels=frame.num_channels,
samples_per_channel=len(processed) // frame.num_channels,
)
This function gets called by the _adjust_volume_in_stream
method, which handles audio frames from either the tts_node
or realtime_audio_output_node
. It's like a volume control wizard working behind the scenes! 🎧✨ Sources: Agent speech and audio | LiveKit Docs