limited-arm-86913
07/13/2025, 8:25 AMtall-airline-16756
07/13/2025, 8:26 AMtall-airline-16756
07/13/2025, 8:26 AMmodalities
parameter to ["TEXT"]
when you create the RealtimeModel
instance. For example:
llm=google.beta.realtime.RealtimeModel(
model="gemini-2.0-flash-exp",
modalities=["TEXT"],
)
This will configure the API to produce only text output.
Additionally, if you want your agent to operate in a fully text-only mode (no audio input or output, just text), you can further configure the session like this:
session = AgentSession(
llm=google.beta.realtime.RealtimeModel(
model="gemini-2.0-flash-exp",
modalities=["TEXT"],
),
room_input_options=RoomInputOptions(text_enabled=True, audio_enabled=False),
room_output_options=RoomOutputOptions(transcription_enabled=True, audio_enabled=False),
)
This ensures all input and output are handled as text only and disables audio streams entirely.
Let me know if you need more details or examples for a specific language or client.
Sources: Gemini Multimodal Live integration guide | LiveKit Docs | agent/main.py | Text and transcriptions | LiveKit Docslimited-arm-86913
07/13/2025, 8:32 PMtall-airline-16756
07/13/2025, 8:33 PMmodalities=["TEXT"]
.
2. Then, just pull out those text responses from the API.
3. Next, send the text to your favorite TTS service (like Google Cloud TTS, ElevenLabs, Deepgram, AWS Polly, Azure TTS - so many great options!).
4. Finally, receive and play or stream the audio output however you need.
The LiveKit Agents framework is really cool because it offers direct integrations for tons of TTS providers, and supports awesome features like voice selection, pronunciation control, and streaming audio for super smooth, natural-sounding interactions.
When you're picking a TTS service, think about what matters most to you - things like latency, voice quality, streaming capability, and any special pronunciation or voice customizations you might want. The good news is that many TTS services can be easily configured through LiveKit's unified TTS interfaces.
Need help setting up a specific TTS provider or want some example code for a particular framework? Just let me know what you're looking for, and I'll be happy to help!
Sources: AI Agents Framework | livekit/livekit_composite | DeepWiki | ElevenLabs integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-spitch/livekit/plugins/spitch/tts.py | Live Conversations with AI using ChatGPT and WebRTC | Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Gemini Multimodal Live integration guide | LiveKit Docs | Agent speech | LiveKit Docs