Lets say I have a 48 core system with 48 gb ram and I have c LiveKit Community #ask-ai

Lets say I have a 48 core system with 48 gb ram an...

many-machine-74335

02/17/2025, 2:06 AM

Lets say I have a 48 core system with 48 gb ram and I have concurrent 50 calls coming, how many workers will I need to handle this?

gray-monkey-83525

02/17/2025, 2:06 AM

It seems likely that with a 48-core system and 48 GB RAM, you could handle a significant number of concurrent calls with relatively few workers. According to LiveKit's deployment documentation, a 4-core, 8GB machine can handle approximately 30 concurrent voice agents. This suggests that your system could potentially handle around 360 concurrent calls if scaled linearly, assuming similar resource usage per call. However, the actual number of workers needed can depend on the specific configuration and workload of your agents. If each worker can handle 30 calls on a 4-core system, you might only need 2 workers to handle 50 calls on your 48-core system, assuming efficient resource allocation and no other bottlenecks. Would you like more detailed guidance on configuring your workers or scaling strategies? Sources: - LiveKit Deployment and Scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:14 AM

@millions-hair-24956 Lets say I have a system where I will have multiple calls coming in, how should I setup the code?

gray-monkey-83525

02/17/2025, 4:14 AM

To handle multiple calls efficiently, you should configure your LiveKit agent to manage concurrency effectively. Here are some steps to consider: 1. Worker Configuration: Use the

WorkerOptions

class to set up your worker. Ensure that your

entrypoint_fnc

is defined to handle incoming jobs, and consider using a

request_fnc

to manage job acceptance based on system load. 2. Concurrency Management: Implement a

load_fnc

to monitor system load and adjust the

load_threshold

to prevent overloading. This function should return a value between 0 and 1, indicating how busy the worker is. 3. Session Handling: Use the

entrypoint

function to connect to rooms and manage sessions. This function should be designed to handle multiple participants and manage resources efficiently. 4. Autoscaling: Consider setting up autoscaling to dynamically adjust the number of workers based on demand. This can be configured using Kubernetes or another orchestration tool. 5. Testing and Optimization: Conduct load testing to determine the optimal number of concurrent calls your system can handle. Adjust your configuration based on these results. Would you like more detailed guidance on any of these steps? Sources: - LiveKit Deployment and Scaling - Anatomy of an Agent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:15 AM

@millions-hair-24956 I need more information on 1 and 2

gray-monkey-83525

02/17/2025, 4:15 AM

Certainly! Let's break down the setup for worker configuration and concurrency management: 1. Worker Configuration To set up your worker using the

WorkerOptions

class, you need to define several key parameters: - entrypoint_fnc
: This is the main function that will be called when a job is assigned to the worker. It should handle the logic for connecting to a room and managing the session. - request_fnc
: This function is used to inspect incoming job requests and decide whether the current worker should accept them. It can be used to implement custom logic based on the current load or other criteria. - prewarm_fnc
: This optional function can be used to perform any necessary initialization in a new process before it starts handling jobs. Here's an example of how you might set up `WorkerOptions`:

Copy code

from livekit.agents import WorkerOptions, cli

def entrypoint(ctx):
    # Logic to handle a new job
    pass

def request(job_request):
    # Logic to decide if the job should be accepted
    pass

opts = WorkerOptions(
    entrypoint_fnc=entrypoint,
    request_fnc=request,
    # Additional options can be set here
)

cli.run_app(opts)

2. Concurrency Management To manage concurrency effectively, you should implement a

load_fnc

that monitors the system's load and adjusts the worker's availability accordingly: - load_fnc
: This function should return a value between 0 and 1, representing the current load on the worker. A value of 1 indicates full capacity, while 0 indicates no load. - load_threshold
: This parameter defines the maximum load value at which the worker will stop accepting new jobs. By default, this is set to 0.75, meaning the worker will stop accepting new jobs when it reaches 75% capacity. Here's an example of how you might implement a `load_fnc`:

Copy code

import psutil

def load_fnc():
    # Calculate the current CPU load
    return psutil.cpu_percent() / 100.0

opts = WorkerOptions(
    entrypoint_fnc=entrypoint,
    request_fnc=request,
    load_fnc=load_fnc,
    load_threshold=0.75
)

cli.run_app(opts)

This setup ensures that your worker can dynamically manage its load and only accept jobs when it has sufficient capacity. Would you like more information on any specific part of this setup? Sources: - Anatomy of an Agent - WorkerOptions Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:27 AM

@millions-hair-24956 Lets say I start a worker using cli.run_app, when this worker reaches max capacity, what happens then? what kind of code do I need to manage this?

gray-monkey-83525

02/17/2025, 4:27 AM

When a worker started with

cli.run_app

reaches its maximum capacity, it will stop accepting new jobs. This is managed by the

load_fnc

and

load_threshold

parameters in the

WorkerOptions

configuration. When the current load exceeds the

load_threshold

, the worker is marked as unavailable for new jobs. What Happens at Max Capacity 1. Job Rejection: The worker will automatically reject new job requests once it reaches the load threshold. This ensures that the worker does not become overloaded and can continue to handle existing jobs efficiently. 2. Load Monitoring: The worker continuously monitors its load using the

load_fnc

. If the load decreases below the threshold, the worker can start accepting new jobs again. Code to Manage Max Capacity To manage this behavior, you should ensure that your

load_fnc

accurately reflects the worker's capacity. Additionally, you can implement logging or alerts to notify you when a worker reaches full capacity. Here's an example setup:

Copy code

import psutil
from livekit.agents import WorkerOptions, cli

def load_fnc():
    # Calculate the current CPU load
    return psutil.cpu_percent() / 100.0

def entrypoint(ctx):
    # Logic to handle a new job
    pass

def request(job_request):
    # Logic to decide if the job should be accepted
    pass

opts = WorkerOptions(
    entrypoint_fnc=entrypoint,
    request_fnc=request,
    load_fnc=load_fnc,
    load_threshold=0.75  # Set threshold to 75% capacity
)

cli.run_app(opts)

Additional Considerations - Autoscaling: Consider implementing an autoscaling mechanism to add more workers when the load is consistently high. This can be done using orchestration tools like Kubernetes. - Graceful Shutdown: Ensure that your worker can gracefully handle shutdowns and restarts, allowing it to drain existing jobs before stopping. Would you like more detailed guidance on any of these aspects? Sources: - https://docs.livekit.io/agents-js/classes/agents.WorkerOptions.html - https://docs.livekit.io/agents/deployment/#Rollout 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:30 AM

@millions-hair-24956 How do I know how much resource a single worker take and how can I determine this before starting a worker?

gray-monkey-83525

02/17/2025, 4:30 AM

To estimate the resources a single worker will consume before starting it, you can follow these steps: 1. Benchmarking: Conduct load tests in a controlled environment to measure the resource usage of a worker under typical conditions. This involves simulating the expected workload and monitoring CPU, memory, and network usage. 2. Monitoring Tools: Use tools like

psutil

in Python to monitor system resources. This can help you understand the baseline resource consumption of your worker. 3. Configuration and Testing: Adjust the

load_fnc

in your

WorkerOptions

to reflect the expected load. This function should return a value between 0 and 1, indicating the worker's load based on CPU or other metrics. Here's a basic example of how you might set up a

load_fnc

to monitor CPU usage:

Copy code

import psutil

def load_fnc():
    # Calculate the current CPU load
    return psutil.cpu_percent() / 100.0

# Use this function in your WorkerOptions
opts = WorkerOptions(
    entrypoint_fnc=entrypoint,
    request_fnc=request,
    load_fnc=load_fnc,
    load_threshold=0.75
)

4. Documentation and Guidelines: Refer to LiveKit's documentation for guidelines on resource requirements. For example, a 4-core, 8GB machine can handle approximately 30 concurrent voice agents, which can help you estimate the resources needed for your specific use case. Would you like more detailed guidance on setting up benchmarking or monitoring tools? Sources: - https://docs.livekit.io/python/livekit/agents/utils/hw/cpu.html - https://docs.livekit.io/python/livekit/agents/utils/hw/index.html 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:32 AM

@millions-hair-24956 1.How and when do I start another worker? 2. I need to know how to do this in code? 3. Lets say I have a multicore system, how can I define a worker for different cores? Is this the way to do it or is there something I am missing, if you need more information on my use case, please ask.

gray-monkey-83525

02/17/2025, 4:32 AM

To manage multiple workers on a multicore system, you can follow these steps: 1. How and When to Start Another Worker - Autoscaling: Start another worker when the current workers are nearing their capacity. This can be determined by monitoring the load using the

load_fnc

and

load_threshold

parameters. If the load consistently exceeds the threshold, it's time to start additional workers. - Manual Scaling: You can manually start additional workers based on anticipated demand, such as during peak usage times. 2. How to Do This in Code You can use the

cli.run_app

function to start a worker. To manage multiple workers, you can run this function in separate processes. Here's a basic example:

Copy code

import multiprocessing
from livekit.agents import WorkerOptions, cli

def start_worker():
    opts = WorkerOptions(
        entrypoint_fnc=entrypoint,
        request_fnc=request,
        load_fnc=load_fnc,
        load_threshold=0.75
    )
    cli.run_app(opts)

if __name__ == "__main__":
    # Start multiple workers
    num_workers = 4  # Adjust based on your system's capacity
    processes = []
    for _ in range(num_workers):
        p = multiprocessing.Process(target=start_worker)
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

3. Defining a Worker for Different Cores - Core Affinity: You can set the CPU affinity for each worker process to bind it to specific cores. This can be done using the

os

module in Python. - Resource Allocation: Ensure that each worker is configured to use a portion of the system's resources. This can be managed by setting appropriate

load_fnc

and

load_threshold

values. If you need more information on your specific use case, please provide details about the expected workload, number of concurrent users, and any specific performance requirements. Would you like more detailed guidance on any of these steps? Sources: - Anatomy of an Agent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:35 AM

@millions-hair-24956 lets say I have a 48 core system, how many do worker do I need to start when my app runs?

gray-monkey-83525

02/17/2025, 4:35 AM

It seems likely that the number of workers you need to start on a 48-core system depends on the workload and the resource requirements of each worker. According to LiveKit's deployment documentation, a 4-core, 8GB machine can handle approximately 30 concurrent voice agents. This suggests that your 48-core system could potentially handle around 360 concurrent agents if scaled linearly. Estimating Number of Workers 1. Resource Allocation: If each worker is expected to handle a similar load as the example (30 agents per 4 cores), you might start with 12 workers (one per 4 cores) to fully utilize the system's capacity. 2. Load Testing: Conduct load tests to determine the actual resource usage per worker and adjust the number of workers accordingly. This will help you optimize the number of workers based on real-world performance. 3. Dynamic Scaling: Consider implementing dynamic scaling to adjust the number of workers based on current demand. This can be managed using orchestration tools like Kubernetes. Would you like more detailed guidance on setting up load testing or dynamic scaling? Sources: - LiveKit Deployment and Scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 4:37 AM

@millions-hair-24956 Are you saying I should create 12 workers? if so, how do I do this in code and how would the agent know which worker to connect to?

gray-monkey-83525

02/17/2025, 4:37 AM

Yes, based on the assumption that each worker can handle a similar load to the example provided (30 agents per 4 cores), you could start with 12 workers on a 48-core system. Here's how you can set this up in code: Starting Multiple Workers You can use Python's

multiprocessing

module to start multiple worker processes. Each worker will run independently and connect to the LiveKit server to handle jobs.

Copy code

import multiprocessing
from livekit.agents import WorkerOptions, cli

def start_worker():
    opts = WorkerOptions(
        entrypoint_fnc=entrypoint,
        request_fnc=request,
        load_fnc=load_fnc,
        load_threshold=0.75
    )
    cli.run_app(opts)

if __name__ == "__main__":
    # Start 12 workers
    num_workers = 12
    processes = []
    for _ in range(num_workers):
        p = multiprocessing.Process(target=start_worker)
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

How the Agent Connects to Workers - Worker Registration: Each worker registers itself with the LiveKit server upon starting. The server manages job distribution and assigns jobs to available workers based on their load and availability. - Job Assignment: When a new job (e.g., a room is created) is available, the LiveKit server notifies one of the registered workers. The worker can then decide to accept or reject the job based on its current load. - Load Balancing: LiveKit's internal load balancing ensures that jobs are distributed evenly across available workers. You don't need to manually assign jobs to specific workers. Would you like more detailed guidance on any specific part of this setup? Sources: - Anatomy of an Agent - LiveKit Deployment and Scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

many-machine-74335

02/17/2025, 3:32 PM

@millions-hair-24956 Can you help me with a code example where lets say I have set the load_threshold for a worker to 0.75, but if the load reaches 0.50. Then I start another worker automatically and so on, then as the load decreases the extra workers are shutdown automatically

gray-monkey-83525

02/17/2025, 3:32 PM

I'm not able to find any data sources to answer your question. You might want to provide more details about the environment or orchestration tools you're using, such as Kubernetes or Docker Swarm. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

4 Views

Open in Slack

Previous Next