What's the best way to create an agent on the fly ...
# ask-ai
r
What's the best way to create an agent on the fly - for example with a different voice or model? Say its deployed on a remote server via docker.
t
Thanks for reaching out, Yashaank. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To dynamically create agents with different voices or models on a remote server (such as one deployed via Docker), recommended methods include: • Runtime agent switching: Instantiate different agent classes with their own TTS voices/models and switch between them in your code as required. • TTS provider and voice change: Change TTS provider or chosen voice by updating agent configuration, or respond to real-time attribute changes (like user-selected voice) using participant attributes. • Agent deployment config: For Docker deployments, use named agents for explicit dispatching, enabling clear separation of agent types. • Prewarming: Use
prewarm
functions to load models and voice metadata at server startup, improving latency for on-demand agent creation. You can mix these methods—for example, by responding to a user's runtime request to switch voice or model, instantiating or updating the agent accordingly. Example code for switching agent voice or provider is available for OpenAI, Cartesia, Rime, PlayAI, ElevenLabs, and LMNT TTS integrations. Let me know if you want detailed implementation examples for a specific TTS provider or need step-by-step code for your deployment scenario. Sources: mcp/agent.py | agent/main.py | examples/other/kokoro_tts.py | Accepting incoming calls | LiveKit Docs | LMNT TTS integration guide | LiveKit Docs
r
So once my python script runs in my docker container, and a fastApi gets all the data to replace; will everything work as expected..? My new agent instance will be created and scaled by itself?
t
Hey there! Let me break down how LiveKit makes scaling your agents super smooth. Think of it like having a smart assistant that automatically creates and assigns workers exactly when you need them. When you deploy in Docker, each worker (running
python agent.py start
) basically raises its hand and says, "I'm ready!" to the LiveKit server, which then distributes jobs like room creation across available workers. Now, if you're using FastAPI for configuration, you'll want to adjust your approach a bit. Instead of treating this like a typical web server, your containers should act more like LiveKit agent workers that connect via WebSocket. It's like making sure your team is ready to jump in and help, rather than just waiting for someone to knock on the door. Want to scale things up? Container orchestration platforms like Kubernetes are your best friend here. As demand increases, you can spin up more worker containers, and LiveKit will automatically distribute the workload. Just make sure your agent creation logic fits into the LiveKit job handler, not as standalone processes. It's all about working smarter, not harder! Got questions about adapting your FastAPI workflow to the LiveKit worker model? I'm all ears! Tell me about your container setup or target orchestration platform, and I'll help you navigate this. Sources: An open source stack for real-time multimodal AI | Deploying to production | LiveKit Docs | Deployment and scaling | LiveKit Docs | Dispatching agents | LiveKit Docs
r
Give me a short and concise example of a vad based voice agent that can be deployed to render.com with autoscaling features.
t
Hey there! Let me walk you through a super cool example of a voice agent that you can easily deploy on render.com with smart autoscaling. It's like having a friendly AI assistant that's ready to help you out!
Copy code
from livekit.agents import JobContext, JobProcess, WorkerOptions, AutoSubscribe, cli, llm
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import silero, openai, deepgram

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        chat_ctx=llm.ChatContext().append(role="system", text="You are a voice assistant.")
    )
    agent.start(ctx.room, participant)
    await agent.say("Hello! How can I help you today?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
render.yaml (autoscaling configuration):
Copy code
services:
- type: pserv
  name: voice-agent
  runtime: image
  plan: starter
  envVars:
  - fromGroup: voice-agent-env
  scaling:
    minInstances: 1
    maxInstances: 4
    targetCPUPercent: 60
    targetMemoryPercent: 60
  maxShutdownDelaySeconds: 300
version: "1"
• Don't forget to set up these environment variables: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, OPENAI_API_KEY, DEEPGRAM_API_KEY. They're like the secret sauce that makes everything work! • Render.com's got your back with autoscaling that adjusts based on CPU and memory usage. Pretty smart, right? • The
maxShutdownDelaySeconds
is a nice touch – it makes sure your voice conversations can wrap up smoothly without getting cut off. Sources: examples/src/pipeline_voice_agent.ts | agent/main.py | render.com/render.yaml | Deployment and scaling | LiveKit Docs