How do workers work in Livekit python sdk?
# ask-ai
m
How do workers work in Livekit python sdk?
g
Workers in LiveKit's Python SDK are responsible for managing agent instances and handling jobs assigned by the LiveKit server. They operate as stateful, long-running processes that connect to the LiveKit network via WebRTC, enabling low-latency, real-time media and data exchange. Workers register with the LiveKit server and remain idle until they are assigned to a room to interact with end users. This process is known as dispatching an agent. Workers can manage multiple agent instances simultaneously, and they are designed to scale horizontally by adding more worker instances as demand grows. For more detailed information, would you like a step-by-step breakdown of how to set up and manage workers in LiveKit? Sources: - Anatomy of an Agent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@millions-hair-24956 Yes please
g
To set up and manage workers in LiveKit using the Python SDK, follow these steps: 1. Define Worker Options: Use the
WorkerOptions
class to configure your worker. This includes setting the
entrypoint_fnc
, which is the function called when a job is assigned to the worker. You can also define other options like
request_fnc
for handling job requests,
prewarm_fnc
for initialization, and
load_fnc
to report system load.
Copy code
from livekit.agents import WorkerOptions, cli

   opts = WorkerOptions(
       entrypoint_fnc=entrypoint_function,
       request_fnc=request_function,
       prewarm_fnc=prewarm_function,
       load_fnc=load_function,
       load_threshold=0.75,
       worker_type=WorkerType.ROOM
   )
2. Start the Worker: Use the
cli.run_app
method to start the worker with the defined options. This will register the worker with the LiveKit server and make it available to handle jobs.
Copy code
if __name__ == "__main__":
       cli.run_app(opts)
3. Handle Job Assignments: The
entrypoint_fnc
is where you define the logic for what the agent should do when it joins a room. This function will be called with a
JobContext
object that provides access to the room and participant information.
Copy code
async def entrypoint_function(ctx: JobContext):
       await ctx.connect()
       # Your logic here
4. Manage Worker Load: The
load_fnc
can be used to report the current system load, which helps in load balancing. The
load_threshold
determines when the worker should stop accepting new jobs. 5. Scaling: To handle more connections, deploy additional worker instances. LiveKit automatically balances the load across available workers. For more details, refer to the Anatomy of an Agent and Deployment and Scaling documentation. Sources: - Anatomy of an Agent - Deployment and Scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@millions-hair-24956 Lets say I created a an app, it has the entrypoint function which handles logic for an agent, then I start the worker using cli.run_app, how much jobs can this worker handle? and what happens when this worker cannot take more jobs?
g
A worker in LiveKit can handle multiple jobs simultaneously, but the exact number depends on the system resources and the complexity of the tasks being performed by each agent. Typically, a 4-core machine can handle around 30 concurrent voice agents, as per LiveKit's deployment documentation. The
load_fnc
and
load_threshold
in the
WorkerOptions
configuration help manage this by indicating when a worker is too busy to accept new jobs. When a worker reaches its capacity, it will stop accepting new jobs. This is managed by the
load_threshold
parameter, which defaults to 0.75 in production mode. If the worker's load exceeds this threshold, it will be marked as unavailable for new jobs, and LiveKit will route new job requests to other available workers. If no other workers are available, the job request will be queued until a worker becomes available. Would you like more detailed information on configuring worker load management? Sources: - Deployment and Scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@millions-hair-24956 Lets say a worker reached its capacity, this worker had an entrypoint function which handled logic for the context setting of the agent. Now when a worker is full, the other workers have to be started by me or do they start automatically?
g
When a worker reaches its capacity, LiveKit's architecture is designed to handle load balancing automatically. If you have multiple workers deployed, LiveKit will automatically route new job requests to other available workers that have not reached their capacity. This ensures that the system can continue to handle incoming requests without manual intervention. For scaling, you can configure an autoscaler to manage the number of worker instances based on the load. This is particularly useful in environments like Kubernetes, where you can use a Horizontal Pod Autoscaler to automatically scale the number of pods (workers) based on CPU or custom metrics. This way, additional workers can be started automatically when the load increases, and they can be scaled down when the load decreases. Would you like more details on setting up autoscaling for LiveKit workers? Sources: - Deployment and Scaling - Autoscaling with Helm 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@millions-hair-24956 Yes please
g
To set up autoscaling for LiveKit workers, especially in a Kubernetes environment, you can use a Horizontal Pod Autoscaler (HPA). Here's a step-by-step guide: 1. Configure Autoscaling in Kubernetes: Use the HPA to automatically adjust the number of worker pods based on CPU utilization or custom metrics. You can define the minimum and maximum number of replicas, as well as the target CPU utilization percentage.
Copy code
apiVersion: autoscaling/v2beta2
   kind: HorizontalPodAutoscaler
   metadata:
     name: livekit-worker-autoscaler
   spec:
     scaleTargetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: livekit-worker
     minReplicas: 1
     maxReplicas: 10
     metrics:
     - type: Resource
       resource:
         name: cpu
         target:
           type: Utilization
           averageUtilization: 60
2. Custom Metrics: If you need to scale based on custom metrics, such as the number of active jobs, you can use Prometheus and a custom metric adapter. This involves setting up Prometheus to collect metrics from your LiveKit deployment and configuring the HPA to use these metrics. 3. Deployment Configuration: Ensure your deployment is configured to allow scaling. This includes setting appropriate resource requests and limits for CPU and memory in your pod specifications. 4. Monitor and Adjust: Continuously monitor the performance and adjust the autoscaling parameters as needed to ensure optimal performance and cost efficiency. For more detailed instructions, refer to the Autoscaling with Helm documentation. Sources: - Autoscaling with Helm 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.