Hi. Can you explain how to load the models llm, st...
# ask-ai
a
Hi. Can you explain how to load the models llm, stt, tts in such a way that the users doesn't have to wait a long time for the first response? I tried the prewarm function but it only gets called once a user joins the room, so the user still have to wait.
t
Thanks for reaching out, Somaan. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To reduce the first response wait time when loading LLM, STT, and TTS models in LiveKit, you can follow these strategies: 1. Preload Models During Initialization: Use the
prewarm
function to preload models like VAD (Voice Activity Detection). For example:
Copy code
def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()
2. Initialize STT, LLM, and TTS with Specific Models: Use low-latency models from plugins like Groq. For instance:
Copy code
agent = VoicePipelineAgent(
    vad=ctx.proc.userdata["vad"],
    stt=groq.STT(model="whisper-large-v3-turbo"),
    llm=groq.LLM(model="llama-3.1-8b-instant"),
    tts=groq.TTS(model="playai-tts", voice="Cheyenne-PlayAI"),
    chat_ctx=initial_ctx,
)
3. Enable Preemptive Generation: Start generating responses before the user finishes speaking to reduce latency:
Copy code
session = AgentSession(
    preemptive_generation=True,
    ...
)
4. Use Efficient Models: Select models optimized for faster response times, such as Groq's
llama-3.1-8b-instant
for LLM and
playai-tts
for TTS. 5. Ensure Proper API Configuration: Provide necessary API keys and configurations to avoid delays in model loading and API calls. These steps should help minimize the wait time for users when interacting with your application. Sources: agent/main.py | agent/main.py | livekit-plugins/livekit-plugins-groq/README.md
a
No but the prewarm only gets called after the user joins. I want to load everything before the first user joins!
t
Hey there! To make sure your LLM, STT, and TTS models are all set and ready before the first user hops into the room, you can load everything right when your app starts—no need to wait for the
prewarm
function. Here’s a friendly walkthrough: 1. Initialize Models When Creating the Agent: Go ahead and create your
VoicePipelineAgent
instance with all the models preloaded. Like this:
Copy code
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(model="gpt-4o-mini"),
    tts=openai.TTS(),
    chat_ctx=initial_ctx,
)
2. Preload Your VAD Models: It’s a good idea to load the VAD model during your app’s startup phase:
vad_model = silero.VAD.load()
3. Start the Agent as Soon as the Room Connects: This way, your agent is all set to jump into action the moment the room is live:
Copy code
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
agent.start(ctx.room)
By doing this upfront, your models will be ready to roll and handle user interactions right away when that first user joins. Easy and smooth! Sources: agent-examples/agent_examples/pipeline_agent.py | Silero VAD plugin | LiveKit Docs