miniature-nail-35141
08/19/2025, 5:04 AMtall-airline-16756
08/19/2025, 5:04 AMtall-airline-16756
08/19/2025, 5:05 AMload_fnc
(usually CPU utilization) and set the HPA to scale up at a threshold a bit lower than the worker’s load_threshold
. For instance, if the load_threshold
is 0.75, try having the HPA kick in at 0.50.
• Shorten the cooldown (stabilization) period when scaling up so your system can quickly handle spikes in demand.
• Lengthen the stabilization period when scaling down to give your active agents enough time to finish their tasks smoothly. You can adjust this using the stabilizationWindowSeconds
setting in the HPA.
If you want to see these ideas in action, check out the example HPA configurations included in the LiveKit Helm charts for components like ingress, egress, and livekit-server. Plus, there’s a handy sample agent-manifest.yaml
for deploying agents on Kubernetes in the LiveKit agent deployment examples repository.
Sources: Deploying to production | LiveKit Docs | Deployment and scaling | LiveKit Docs | examples/server-do.yaml | kubernetes/README.md | Deploying to production | LiveKit Docs