Can I use TTS provider in realtime gemini?
# ask-ai
f
Can I use TTS provider in realtime gemini?
t
Thanks for reaching out, Baymurat. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Absolutely, you can use a TTS provider with Gemini in real-time within LiveKit! Here’s how you can do it: 1. Configure Gemini for Text-Only Output: Just set the modalities of the Gemini Live API to
text
so it won’t use its built-in speech synthesis. 2. Include a TTS Provider: Add a supported TTS provider (like Cartesia) in your AgentSession to take care of converting text to speech. Here’s a simple example to get you started:
Copy code
from google.genai.types import Modality
from livekit.agents import AgentSession
from livekit.plugins import cartesia

session = AgentSession(
    llm=google.beta.realtime.RealtimeModel(modalities=[Modality.TEXT]),
    tts=cartesia.TTS(),
)
With this setup, Gemini will handle the speech-to-text part, and your chosen TTS provider will create the audio output. Easy and effective!
f
?
t
Hey there! Here are some friendly tips to help you troubleshoot the issues you’re having with Gemini real-time: • Delayed Responses: Try making sure your client and server are in the same region to cut down on network lag. Also, bumping up the
queue_size_ms
in the
rtc.AudioSource
can help smooth out audio streaming. • Unprompted AI Speech: You might want to tweak the Voice Activity Detection (VAD) settings, like
min_silence_duration
, to avoid false alarms. After making changes, double-check how the audio behaves after silence warnings. • Incorrect Responses: Keeping the context window managed well can prevent irrelevant answers. Just a heads-up, Gemini 2.5 has a known hiccup with function calling that might cause some unexpected responses. • General Audio Settings: Make sure your audio input and output settings, like sample rate and buffer sizes, are set up correctly. If there’s a mismatch, tools like
rtc.AudioResampler
can come in handy. • Network Configuration: It’s a good idea to check if any firewalls or proxies are blocking or slowing down your network traffic. If you want to dive deeper, the Gemini Live API integration guide is a great resource. And of course, if you need any more help, just give me a shout! Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Gemini Live API integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/agent_activity.py | OpenAI Realtime API integration guide | LiveKit Docs