I am running livekit workers in a kubernetes clust...
# ask-ai
s
I am running livekit workers in a kubernetes cluster. The workers connect to livekit cloud. What is the best way to scale these workers on demand? Say suddendly I need to run 100 calls, I would have to create new workers to handle this. What is the best practice for that? [knowing health of worker and load, knowing which ports they should register to, etc.]
t
Thanks for reaching out, Rida. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To scale LiveKit workers on demand in Kubernetes for LiveKit Cloud, follow these best practices: • No inbound port management is needed: Workers connect out to LiveKit Cloud via WebSockets and do not require any inbound host or port configuration. • Autoscale using load metrics: Use Kubernetes' autoscaler (Horizontal Pod Autoscaler) and set your scale-up threshold lower than the worker’s internal
load_threshold
(which defaults to 0.75). For example, trigger scaling at 0.50 to ensure capacity before workers reach their maximum recommended load. • Worker load reporting: Each worker reports its load using the
load_fnc
(by default based on CPU usage). Jobs are dispatched uniformly across available workers based on their reported load and internal thresholds. • Health checks: Workers can optionally expose a health endpoint at
<http://0.0.0.0:8081/>
for liveness checking by Kubernetes. • Resource planning: Recommended allocation is 4 CPU cores per 25 voice agents (based on typical 30-agent/30-user load tests on 4-core, 8GB nodes). Workers are stateless—about 10GB ephemeral storage suffices for temporary files and the container image. • Graceful termination: For voice agents, set
terminationGracePeriodSeconds
(e.g., 600 seconds) in your deployment to allow jobs to complete on scale-down events. • Environment separation: Always use distinct LiveKit projects for dev and prod to prevent local dev agents from joining your production worker pool. This model enables elastic, health-aware scaling based on real-time worker load, and makes port management unnecessary when connecting to LiveKit Cloud. Sources: Deployment and scaling | LiveKit Docs | Deploying to production | LiveKit Docs