ERROR asyncio - Task exception was never retrieved...
# ask-ai
b
ERROR asyncio - Task exception was never retrieved future: <Task finished name='Task-431' coro=<AgentSetup.say_greetings() done, defined at /Users/sandeep/projects/pulse/agent/base_agent.py:122> exception=RuntimeError('trying to generate speech from text without a TTS model')>
t
Thanks for reaching out, Sandeep. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! The error you're seeing happens because the system is trying to generate speech from text, but it doesn’t have a Text-to-Speech (TTS) model set up yet. To fix this, just make sure you include a TTS model when you initialize the
AgentSession
. Here’s a quick example:
Copy code
session = AgentSession(
    stt=deepgram.STT(model="nova-3", language="multi"),
    llm=openai.LLM(model="gpt-4o-mini"),
    tts=cartesia.TTS(model="sonic-2", voice="f786b574-daa5-4673-aa0c-cbe3e8534c02"),  # This part is important!
    vad=silero.VAD.load(),
    turn_detection=MultilingualModel(),
)
Just double-check that the TTS model you pick works well with your setup. Some popular options are Cartesia, Deepgram, ElevenLabs, Google Gemini, and a few others. Each one needs specific details like
model
and
voice
to work properly. If you want, I’m here to help you get the TTS model configured correctly—just give me a shout! Sources: Deepgram TTS integration guide | LiveKit Docs | ElevenLabs TTS integration guide | LiveKit Docs | Gemini TTS integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/generation.py | Agent speech and audio | LiveKit Docs
b
but I am using session = AgentSession( vad=ctx.proc.userdata["vad"], _min_interruption_words_=0, userdata=UserData(), _min_endpointing_delay_=0.08, _max_endpointing_delay_=0.80, _max_tool_steps_=10, _user_away_timeout_=SILENCE_THRESHOLD, llm=realtime.RealtimeModel() # Using realtime model here )
t
When you’re working with a realtime model in
AgentSession
, it’s important to set it up for text-only output and add a separate TTS plugin for turning that text into speech. Here’s a friendly little guide to get you started:
Copy code
from livekit.agents import AgentSession
from livekit.plugins import openai

session = AgentSession(
    llm=openai.realtime.RealtimeModel(modalities=["text"]),  # Text-only output
    tts=openai.TTS(voice="ash")  # Add a TTS plugin
)
This way, your realtime model focuses on generating text responses, and the TTS plugin takes care of converting those into speech. You’ve got options like OpenAI, Cartesia, Deepgram, and more for TTS. If you want a hand picking the right TTS provider or tweaking any settings, just give me a shout—I’m here to help! Sources: Text-to-speech (TTS) integrations | LiveKit Docs | OpenAI Realtime API integration guide | LiveKit Docs | Turn detection and interruptions | LiveKit Docs | OpenAI TTS integration guide | LiveKit Docs | Realtime model integrations | LiveKit Docs