In Python Agents, what’s the guaranteed frame form...
# ask-ai
f
In Python Agents, what’s the guaranteed frame format for
rtc.AudioFrame
? (e.g., interleaved int16,
samples_per_channel
,
sample_rate
,
num_channels
). Please confirm typical sample rate when SIP ingress + BVC are enabled (16 kHz vs 48 kHz), and whether telephony is downmixed to mono before arriving to the agent. Provide a short struct/fields table.
t
Thanks for reaching out, Muthana. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! Just wanted to share some info about the
rtc.AudioFrame
in Python Agents and its guaranteed frame format: • Data Format: It uses 16-bit signed integers (
int16
) interleaved by channel — pretty standard stuff. • Sample Rate: Usually 48 kHz when SIP ingress and BVC are enabled, so the audio sounds crisp. • Channels: Telephony audio typically comes downmixed to mono (1 channel) before it reaches the agent. Struct/Fields:sample_rate: int - This is the sample rate in Hz (like 48000). • num_channels: int - Number of audio channels (usually 1). • samples_per_channel: int - How many samples per channel (for example, 480). • data: bytes - The interleaved
int16
audio data itself. If anything’s unclear or you want to dive deeper, just let me know—I’m here to help!