Hi Can you explain how to load the models llm stt tts in suc LiveKit Community #ask-ai

Hi. Can you explain how to load the models llm, st...

aloof-airport-60870

08/07/2025, 11:12 AM

Hi. Can you explain how to load the models llm, stt, tts in such a way that the users doesn't have to wait a long time for the first response? I tried the prewarm function but it only gets called once a user joins the room, so the user still have to wait.

tall-airline-16756

08/07/2025, 11:12 AM

Thanks for reaching out, Somaan. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/07/2025, 11:12 AM

To reduce the first response wait time when loading LLM, STT, and TTS models in LiveKit, you can follow these strategies: 1. Preload Models During Initialization: Use the

prewarm

function to preload models like VAD (Voice Activity Detection). For example:

Copy code

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

2. Initialize STT, LLM, and TTS with Specific Models: Use low-latency models from plugins like Groq. For instance:

Copy code

agent = VoicePipelineAgent(
    vad=ctx.proc.userdata["vad"],
    stt=groq.STT(model="whisper-large-v3-turbo"),
    llm=groq.LLM(model="llama-3.1-8b-instant"),
    tts=groq.TTS(model="playai-tts", voice="Cheyenne-PlayAI"),
    chat_ctx=initial_ctx,
)

3. Enable Preemptive Generation: Start generating responses before the user finishes speaking to reduce latency:

Copy code

session = AgentSession(
    preemptive_generation=True,
    ...
)

4. Use Efficient Models: Select models optimized for faster response times, such as Groq's

llama-3.1-8b-instant

for LLM and

playai-tts

for TTS. 5. Ensure Proper API Configuration: Provide necessary API keys and configurations to avoid delays in model loading and API calls. These steps should help minimize the wait time for users when interacting with your application. Sources: agent/main.py | agent/main.py | livekit-plugins/livekit-plugins-groq/README.md

aloof-airport-60870

08/07/2025, 11:15 AM

No but the prewarm only gets called after the user joins. I want to load everything before the first user joins!

tall-airline-16756

08/07/2025, 11:15 AM

Hey there! To make sure your LLM, STT, and TTS models are all set and ready before the first user hops into the room, you can load everything right when your app starts—no need to wait for the

prewarm

function. Here’s a friendly walkthrough: 1. Initialize Models When Creating the Agent: Go ahead and create your

VoicePipelineAgent

instance with all the models preloaded. Like this:

Copy code

agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(model="gpt-4o-mini"),
    tts=openai.TTS(),
    chat_ctx=initial_ctx,
)

2. Preload Your VAD Models: It’s a good idea to load the VAD model during your app’s startup phase:

vad_model = silero.VAD.load()

3. Start the Agent as Soon as the Room Connects: This way, your agent is all set to jump into action the moment the room is live:

Copy code

await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
agent.start(ctx.room)

By doing this upfront, your models will be ready to roll and handle user interactions right away when that first user joins. Easy and smooth! Sources: agent-examples/agent_examples/pipeline_agent.py | Silero VAD plugin | LiveKit Docs

Open in Slack

Previous Next