square-jewelry-14040
07/12/2025, 7:54 AMtall-airline-16756
07/12/2025, 7:55 AMtall-airline-16756
07/12/2025, 7:56 AMload_threshold
(which defaults to 0.75). For example, trigger scaling at 0.50 to ensure capacity before workers reach their maximum recommended load.
• Worker load reporting: Each worker reports its load using the load_fnc
(by default based on CPU usage). Jobs are dispatched uniformly across available workers based on their reported load and internal thresholds.
• Health checks: Workers can optionally expose a health endpoint at <http://0.0.0.0:8081/>
for liveness checking by Kubernetes.
• Resource planning: Recommended allocation is 4 CPU cores per 25 voice agents (based on typical 30-agent/30-user load tests on 4-core, 8GB nodes). Workers are stateless—about 10GB ephemeral storage suffices for temporary files and the container image.
• Graceful termination: For voice agents, set terminationGracePeriodSeconds
(e.g., 600 seconds) in your deployment to allow jobs to complete on scale-down events.
• Environment separation: Always use distinct LiveKit projects for dev and prod to prevent local dev agents from joining your production worker pool.
This model enables elastic, health-aware scaling based on real-time worker load, and makes port management unnecessary when connecting to LiveKit Cloud.
Sources: Deployment and scaling | LiveKit Docs | Deploying to production | LiveKit Docs