future-stone-69754
08/11/2025, 12:45 PMtall-airline-16756
08/11/2025, 12:45 PMtall-airline-16756
08/11/2025, 12:46 PMtext
so it won’t use its built-in speech synthesis.
2. Include a TTS Provider: Add a supported TTS provider (like Cartesia) in your AgentSession to take care of converting text to speech.
Here’s a simple example to get you started:
from google.genai.types import Modality
from livekit.agents import AgentSession
from livekit.plugins import cartesia
session = AgentSession(
llm=google.beta.realtime.RealtimeModel(modalities=[Modality.TEXT]),
tts=cartesia.TTS(),
)
With this setup, Gemini will handle the speech-to-text part, and your chosen TTS provider will create the audio output. Easy and effective!future-stone-69754
08/11/2025, 12:52 PMtall-airline-16756
08/11/2025, 12:55 PMqueue_size_ms
in the rtc.AudioSource
can help smooth out audio streaming.
• Unprompted AI Speech: You might want to tweak the Voice Activity Detection (VAD) settings, like min_silence_duration
, to avoid false alarms. After making changes, double-check how the audio behaves after silence warnings.
• Incorrect Responses: Keeping the context window managed well can prevent irrelevant answers. Just a heads-up, Gemini 2.5 has a known hiccup with function calling that might cause some unexpected responses.
• General Audio Settings: Make sure your audio input and output settings, like sample rate and buffer sizes, are set up correctly. If there’s a mismatch, tools like rtc.AudioResampler
can come in handy.
• Network Configuration: It’s a good idea to check if any firewalls or proxies are blocking or slowing down your network traffic.
If you want to dive deeper, the Gemini Live API integration guide is a great resource. And of course, if you need any more help, just give me a shout!
Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Gemini Live API integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/agent_activity.py | OpenAI Realtime API integration guide | LiveKit Docs