red-helicopter-96281
07/11/2025, 8:20 PMtall-airline-16756
07/11/2025, 8:20 PMtall-airline-16756
07/11/2025, 8:21 PMprewarm
functions to load models and voice metadata at server startup, improving latency for on-demand agent creation.
You can mix these methods—for example, by responding to a user's runtime request to switch voice or model, instantiating or updating the agent accordingly. Example code for switching agent voice or provider is available for OpenAI, Cartesia, Rime, PlayAI, ElevenLabs, and LMNT TTS integrations.
Let me know if you want detailed implementation examples for a specific TTS provider or need step-by-step code for your deployment scenario.
Sources: mcp/agent.py | agent/main.py | examples/other/kokoro_tts.py | Accepting incoming calls | LiveKit Docs | LMNT TTS integration guide | LiveKit Docsred-helicopter-96281
07/11/2025, 8:35 PMtall-airline-16756
07/11/2025, 8:36 PMpython agent.py start
) basically raises its hand and says, "I'm ready!" to the LiveKit server, which then distributes jobs like room creation across available workers.
Now, if you're using FastAPI for configuration, you'll want to adjust your approach a bit. Instead of treating this like a typical web server, your containers should act more like LiveKit agent workers that connect via WebSocket. It's like making sure your team is ready to jump in and help, rather than just waiting for someone to knock on the door.
Want to scale things up? Container orchestration platforms like Kubernetes are your best friend here. As demand increases, you can spin up more worker containers, and LiveKit will automatically distribute the workload. Just make sure your agent creation logic fits into the LiveKit job handler, not as standalone processes. It's all about working smarter, not harder!
Got questions about adapting your FastAPI workflow to the LiveKit worker model? I'm all ears! Tell me about your container setup or target orchestration platform, and I'll help you navigate this.
Sources: An open source stack for real-time multimodal AI | Deploying to production | LiveKit Docs | Deployment and scaling | LiveKit Docs | Dispatching agents | LiveKit Docsred-helicopter-96281
07/11/2025, 8:43 PMtall-airline-16756
07/11/2025, 8:44 PMfrom livekit.agents import JobContext, JobProcess, WorkerOptions, AutoSubscribe, cli, llm
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import silero, openai, deepgram
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
agent = VoicePipelineAgent(
vad=ctx.proc.userdata["vad"],
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(),
chat_ctx=llm.ChatContext().append(role="system", text="You are a voice assistant.")
)
agent.start(ctx.room, participant)
await agent.say("Hello! How can I help you today?", allow_interruptions=True)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
render.yaml (autoscaling configuration):
services:
- type: pserv
name: voice-agent
runtime: image
plan: starter
envVars:
- fromGroup: voice-agent-env
scaling:
minInstances: 1
maxInstances: 4
targetCPUPercent: 60
targetMemoryPercent: 60
maxShutdownDelaySeconds: 300
version: "1"
• Don't forget to set up these environment variables: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, OPENAI_API_KEY, DEEPGRAM_API_KEY. They're like the secret sauce that makes everything work!
• Render.com's got your back with autoscaling that adjusts based on CPU and memory usage. Pretty smart, right?
• The maxShutdownDelaySeconds
is a nice touch – it makes sure your voice conversations can wrap up smoothly without getting cut off. Sources: examples/src/pipeline_voice_agent.ts | agent/main.py | render.com/render.yaml | Deployment and scaling | LiveKit Docs