when creating a room with create_dispatch, how can...
# ask-ai
i
when creating a room with create_dispatch, how can i get the token for that room? or do i need to create the room separately?
t
Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To generate a token for a room created with
create_dispatch
, you can include the room configuration in the token itself. Here's an example in Python:
Copy code
def create_token_with_agent_dispatch() -> str:
    token = (
        AccessToken()
        .with_identity("my_participant")
        .with_grants(VideoGrants(room_join=True, room=room_name))
        .with_room_config(
            RoomConfiguration(
                agents=[
                    RoomAgentDispatch(agent_name="test-agent", metadata="my_metadata")
                ],
            ),
        )
        .to_jwt()
    )
    return token
This code creates an access token with a specific participant identity, grants them permission to join a room, and configures the room to dispatch a specific agent when the user joins. For Node.js, you can also create a token with room configuration for agent dispatch:
Copy code
import { AccessToken, SIPGrant, VideoGrant } from 'livekit-server-sdk';
import { RoomAgentDispatch, RoomConfiguration } from '@livekit/protocol';

const roomName = 'name-of-room';
const participantName = 'user-name';
const agentName = 'my-agent';

const at = new AccessToken('api-key', 'secret-key', {
  identity: participantName,
});
Let me know if you need further assistance! Sources: Dispatching agents | LiveKit Docs | Authentication | LiveKit Docs
i
When is download-files necessary to run? everytime before initializing workers via cli.run_app?
t
Unfortunately, I couldn't find specific information about whether
download-files
is necessary to run every time before initializing workers via
cli.run_app
. This detail might depend on your specific setup or use case. If you have further details or context about your workflow, feel free to share, and I can assist further. Alternatively, I can escalate this to a human specialist for more in-depth assistance.
i
This code that you've provided, what exaclty does it do and how does it differ from the normal explicit dispatch flow which uses create_explicit_dispatch? def create_token_with_agent_dispatch() -> str: token = ( AccessToken() .with_identity("my_participant") .with_grants(VideoGrants(room_join=True, room=room_name)) .with_room_config( RoomConfiguration( agents=[ RoomAgentDispatch(agent_name="test-agent", metadata="my_metadata") ], ), ) .to_jwt() ) return token
t
The code provided creates a token with agent dispatch by embedding the agent configuration directly into the token. Here's how it differs from the normal explicit dispatch flow using `create_explicit_dispatch`: • Automatic vs. Explicit Control: The token with agent dispatch sets up automatic agent dispatch when a room is created with that token configuration. In contrast,
create_explicit_dispatch
allows you to manually dispatch an agent to a room at a specific time via the API. • Implementation Method: The token approach configures the agent dispatch within the access token itself using
RoomConfiguration
. Explicit dispatch, on the other hand, uses the
AgentDispatchService
API to dispatch an agent to an existing room. • Timing: With the token approach, agent dispatch is configured when the room is created. Explicit dispatch can be triggered at any time after the agent is registered with an
agent_name
. Explicit dispatch also requires setting the
agent_name
field in
WorkerOptions
when registering the agent, which prevents automatic dispatch to newly created rooms. Let me know if you need further clarification! Sources: Dispatching agents | LiveKit Docs | Authentication | LiveKit Docs
i
So with create_token_with_agent_dispatch, i create a room and dispatch an agent to that room when i run the function? that sounds exactly the same as what create_dispatch does
t
While
create_token_with_agent_dispatch
and
create_dispatch
may seem similar, they differ in timing and approach: • `create_token_with_agent_dispatch`: This function creates an access token with agent dispatch configuration embedded in it. When a user uses this token to join a room, the agent is automatically dispatched to that room. The dispatch is configured at the time of token creation, before the room is joined. • `create_dispatch`: This is an API method that explicitly dispatches an agent to a room. It requires parameters like the room name, agent name, and optional metadata, and is used to programmatically send an agent to a specific room after the room is already created. The key difference lies in when and how the agent dispatch is configured.
create_token_with_agent_dispatch
is pre-configured in the token, while
create_dispatch
is a manual API call for an existing room. Let me know if you have further questions! Sources: Dispatching agents | LiveKit Docs | Authentication | LiveKit Docs
i
I see, so with creaet_token_with_agent_dispatch you create the room, and then wait for a participant to join the room, at which point the agent is dispatched to that room?
t
Yes, that's correct! With
create_token_with_agent_dispatch
, the room is created and configured with the agent dispatch settings embedded in the token. When a participant uses the token to join the room, the agent is automatically dispatched to that room at that time. This approach ensures that the agent is only dispatched when the room is actively joined by a participant, streamlining the process. Let me know if you need further clarification!
i
For workers that are dispatched, when they shut down, when the server that they operate on (for example my local computer) is shut down?
t
When a worker shuts down, such as when the server it operates on (e.g., your local computer) is shut down, the worker typically follows a structured shutdown process. This process may include: • Logging a message indicating the worker is shutting down. • Canceling the main task and closing any associated resources, such as process pools, HTTP sessions, and servers. • Waiting for all tasks to complete and ensuring that message channels are properly closed. • Handling any remaining tasks or processes with a timeout mechanism to avoid indefinite waits. The exact behavior may depend on the implementation of the worker and the shutdown sequence defined in your application. Let me know if you need more specific details or further assistance! Sources: livekit.agents API documentation | livekit.agents API documentation
i
Sorry i meant to ask, when do the workers shut down? and if an agent is dispatched to a room, when does it leave the room?
t
Workers and agents shut down or leave a room under the following conditions: • Workers Shut Down: ◦ When explicitly shut down using
ctx.shutdown(reason="Session ended")
. ◦ When the room automatically closes after the last non-agent participant has left, causing the worker to disconnect. • Agents Leave a Room: ◦ When explicitly disconnected using
ctx.shutdown()
. ◦ When the room is deleted using the server API's
deleteRoom
function, which disconnects all participants including agents. ◦ When the last human participant leaves the room, causing the room to close automatically and disconnect any remaining agents. Let me know if you need further clarification or additional details! Sources: Inside a session | LiveKit Docs
i
Im trying to use bedrock, but im getting this error, would you happen to know why? botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the ConverseStream operation: The security token included in the request is invalid.
t
The error
botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the ConverseStream operation: The security token included in the request is invalid
typically indicates an issue with the credentials or security token being used for authentication. Here are some common causes and steps to resolve this: • Ensure that the AWS credentials (Access Key ID and Secret Access Key) being used are correct and have the necessary permissions for the operation. • Verify that the security token is valid and has not expired. If you are using temporary credentials, you may need to refresh them. • Check that the AWS region specified in your configuration matches the region of the Bedrock service you are trying to access. • Ensure that the AWS SDK or CLI is configured correctly with the appropriate profile or environment variables. If the issue persists, please provide additional details about your setup, and I can assist further or escalate the issue to a specialist.
i
Copy code
def prewarm_fnc(proc: JobProcess):
    # load silero weights and store to process userdata
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    # access the loaded silero instance
    vad: silero.VAD = ctx.proc.userdata["vad"]
whats the point of the prewarm function - does it run before the entrypoint function? without the prewarm function, would you instead have to load the weights directionly in the entrypoint? and that could take unecessary time?
t
The
prewarm
function is designed to initialize resources, such as loading model files or other time-consuming operations, before the
entrypoint
function is executed. Here's how it works and why it's useful: • Purpose: The
prewarm
function prepares the environment by loading resources (e.g., Silero weights) and storing them in the process userdata. This ensures that these resources are ready and available when the
entrypoint
function is called. • Execution Order: The
prewarm
function runs before the
entrypoint
function. The
entrypoint
function can then access the preloaded resources via the context's process userdata. • Performance Benefits: Without the
prewarm
function, you would need to load the weights directly in the
entrypoint
function, which could introduce unnecessary delays during job processing. By preloading the resources, the
entrypoint
function can execute more quickly. In your example, the
prewarm
function loads the Silero VAD weights and stores them in
proc.userdata
, making them accessible to the
entrypoint
function when needed. Let me know if you have further questions! Sources: Anatomy of an Agent | LiveKit Docs
i
does the prewarm function run when the worker is registered and before the agent is dispatched (at which point the entrypoint runs)?
t
Yes, the
prewarm
function runs when a new worker process is created, which happens before the agent is dispatched and before the
entrypoint
function is executed. Its purpose is to initialize resources and prepare the process environment so that it is ready to handle jobs efficiently. This ensures that any time-consuming setup, such as loading model files, is completed before the agent session begins, improving performance and reducing delays when the
entrypoint
function is called. Let me know if you have further questions! Sources: Anatomy of an Agent | LiveKit Docs
i
Can you give me a code example to show how to deploy multiple agents in one worker? For example the code below creates 1 agent everytime the worker is deployed and registered, right? What if i wanted for this one worker to manage 10 agents? Is that something thats recommended to do? import logging from dotenv import load_dotenv from livekit import rtc from livekit.agents import ( AutoSubscribe, JobContext, JobProcess, WorkerOptions, cli, llm, metrics, ) from livekit.agents.pipeline import VoicePipelineAgent from livekit.plugins import deepgram, openai, silero load_dotenv() logger = logging.getLogger("voice-assistant") def prewarm(proc: JobProcess): proc.userdata["vad"] = silero.VAD.load() async def entrypoint(ctx: JobContext): initial_ctx = llm.ChatContext().append( role="system", text=( "You are a voice assistant created by LiveKit. Your interface with users will be voice. " "You should use short and concise responses, and avoiding usage of unpronouncable punctuation." ), ) logger.info(f"connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) # wait for the first participant to connect participant = await ctx.wait_for_participant() logger.info(f"starting voice assistant for participant {participant.identity}") dg_model = "nova-3-general" if participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP: # use a model optimized for telephony dg_model = "nova-2-phonecall" agent = VoicePipelineAgent( vad=ctx.proc.userdata["vad"], stt=deepgram.STT(model=dg_model), llm=openai.LLM(), tts=openai.TTS(), chat_ctx=initial_ctx, ) agent.start(ctx.room, participant) usage_collector = metrics.UsageCollector() @agent.on("metrics_collected") def _on_metrics_collected(mtrcs: metrics.AgentMetrics): metrics.log_metrics(mtrcs) usage_collector.collect(mtrcs) async def log_usage(): summary = usage_collector.get_summary() logger.info(f"Usage: ${summary}") ctx.add_shutdown_callback(log_usage) await agent.say("Hello there! How can I help you today?", allow_interruptions=False) if name == "__main__": cli.run_app( WorkerOptions( entrypoint_fnc=entrypoint, prewarm_fnc=prewarm, ), )
t
Deploying multiple agents in one worker is possible, but it requires careful consideration of resource allocation and implementation. Here are some key points and recommendations: • Resource Allocation: For voice agents, 4 cores can support approximately 25 agents. A test with 30 agents used ~3.8 cores and ~2.8GB of memory. Ensure your worker has sufficient resources to handle multiple agents efficiently. • Implementation: To manage multiple agents in one worker, you can modify the
entrypoint
function to handle multiple participants and create an agent for each participant. Each agent can be stored in a dictionary or list for management. • Example: While the provided sources don't include a specific example, you can extend your current code to create and manage multiple agents dynamically based on the number of participants or predefined rules. • Recommendations: Deploying multiple agents in one worker is feasible, but it may be more efficient to deploy multiple workers, especially if you expect high concurrency or resource-intensive operations. Tools like Render.com or Kubernetes can help with scaling and deployment. If you'd like, I can help draft a code example for managing multiple agents in one worker. Let me know! Sources: Deployment and scaling | LiveKit Docs | Dispatching agents | LiveKit Docs
i
So are you saying that its recommended to have one agent per worker, instead of multiple agents per worker?
t
Yes, it is generally recommended to have one agent per worker rather than multiple agents per worker. This approach offers several advantages: • Isolation: Each agent operates in its own process, ensuring that issues with one agent do not affect others. • Scalability: Deploying one agent per worker makes it easier to scale horizontally by adding more workers as needed. • Resource Management: It simplifies resource allocation and monitoring, as each worker is responsible for a single agent. • Performance: Running multiple agents in a single worker can lead to resource contention, especially for CPU and memory-intensive tasks like voice processing. That said, if your use case involves lightweight agents or you have specific constraints, deploying multiple agents in one worker can be considered, but it requires careful resource management and testing to ensure stability and performance. Let me know if you'd like further guidance or examples!
i
You mentioned "A test with 30 agents used ~3.8 cores and ~2.8GB of memory.", so was this 30 workers deployed on the same machine, each one controlling one agent?
t
Yes, the test with 30 agents using ~3.8 cores and ~2.8GB of memory likely involved 30 workers deployed on the same machine, with each worker managing one agent. This setup aligns with the recommendation of having one agent per worker for better isolation, scalability, and resource management. By deploying one agent per worker, the system ensures that each agent operates independently, and resource usage can be effectively distributed across the available cores and memory. Let me know if you have further questions or need additional clarification!
i
So whats the point of workers if they just manage one agent?
t
Workers play a crucial role even when managing just one agent, as they provide a structured and efficient way to handle agent sessions. Here are the key benefits of using workers: • Isolation: Each worker runs in its own process, ensuring that issues with one agent (e.g., crashes or resource spikes) do not affect other agents or the overall system. • Scalability: Workers make it easy to scale horizontally by adding more workers to handle increased demand. This is particularly useful in distributed systems. • Resource Management: Workers allow for precise allocation of resources (CPU, memory) to individual agents, ensuring optimal performance and avoiding resource contention. • Flexibility: Workers can be deployed across different machines or environments, enabling distributed processing and fault tolerance. • Lifecycle Management: Workers handle the lifecycle of agents, including initialization (e.g., prewarming), execution, and shutdown, simplifying the overall architecture. While it might seem redundant to have one worker per agent, this design ensures robustness, scalability, and maintainability, especially in systems with high concurrency or complex workloads. Let me know if you'd like further clarification!
i
when building a distributed setup, is it recommended to have one worker per machine or multiple workers per machine? the latter means everytime you provision a new machine you get many more agents
t
The choice between deploying one worker per machine or multiple workers per machine in a distributed setup depends on your specific requirements and resource constraints. Here are some considerations for each approach: • One Worker Per Machine: ◦ Ensures complete isolation between workers, reducing the risk of resource contention or interference. ◦ May simplify resource monitoring and management, as each machine is dedicated to a single worker. ◦ Can be less efficient in terms of resource utilization, especially if the worker does not fully utilize the machine's capacity. • Multiple Workers Per Machine: ◦ Allows for better utilization of machine resources (CPU, memory) by running multiple workers concurrently. ◦ Scales more effectively, as provisioning a new machine adds multiple workers and agents. ◦ Requires careful resource management to avoid contention and ensure stability. In general, running multiple workers per machine is a common approach for maximizing resource efficiency, but it requires proper monitoring and scaling mechanisms to ensure performance and reliability. Let me know if you'd like further guidance or examples!
i
So how would you deploy multiple workers on one machine, give me a code example if helpful. In the previous example i gave you, would you basically just run the python script X amount of times to deploy X workers/agents?
t
Yes, to deploy multiple workers on one machine, you can run the Python script multiple times, each instance representing a separate worker. This approach effectively creates multiple workers on the same machine, each managing its own agent. Here are a few ways to achieve this: • Manual Execution: Run the script multiple times manually in separate terminal windows or processes. • Process Management Tools: Use tools like
tmux
,
screen
, or
systemd
to manage multiple instances of the script. • Custom Script: Write a wrapper script to launch multiple instances of the worker script. For example:
Copy code
import subprocess

# Number of workers to deploy
num_workers = 5

# Path to your worker script
worker_script = "path/to/your_worker_script.py"

# Launch multiple workers
processes = []
for i in range(num_workers):
    process = subprocess.Popen(["python", worker_script])
    processes.append(process)

# Optionally, wait for all processes to complete
for process in processes:
    process.wait()
This script launches 5 instances of your worker script. Adjust the
num_workers
variable as needed. Let me know if you'd like further assistance or clarification!
i
# Good for ~30 concurrent 1:1 AI conversations resources: requests: cpu: "4" memory: "8Gi" limits: cpu: "4" memory: "8Gi" This is in the Kubernetes deployment example from LiveKit. Does this mean that one container can run 30 agents or one agent can handle 30 concurrent conversations?
t
The statement "Good for ~30 concurrent 1:1 AI conversations" in the Kubernetes deployment example likely refers to the capacity of one container. Specifically, it suggests that a single container, with the specified resource allocation (4 CPUs and 8Gi of memory), can support approximately 30 concurrent 1:1 AI conversations, assuming one agent per conversation. This does not mean that one agent can handle 30 conversations simultaneously. Instead, it implies that the container can run multiple agents (up to 30) concurrently, each managing a single conversation. Let me know if you need further clarification or additional details!
i
But agents cant run concurrently can they? They each initiate their own event loop, meaning they can't run within an async task
t
Actually, multiple agents can run concurrently within one container or worker. Each agent operates in its own process, ensuring isolation and allowing them to run simultaneously. This design enables a single worker to host multiple agent instances efficiently. While each agent initiates its own event loop, the worker framework manages these processes independently, so they do not interfere with each other. This is how a container with sufficient resources (e.g., 4 CPUs and 8Gi memory) can support up to 30 concurrent 1:1 AI conversations, with each agent handling one conversation. Let me know if you have further questions or need additional clarification! Sources: LiveKit Agents | Anatomy of an Agent | LiveKit Docs
i
Now im confused, earlier you said its not recommended to have one worker manage multilpe agents
t
I understand the confusion, and let me clarify. While it is technically possible for one worker to manage multiple agents, the recommendation to have one worker per agent is based on simplicity, isolation, and ease of scaling. Here’s a breakdown: • One Worker Per Agent: This approach ensures complete isolation, making it easier to debug, monitor, and scale. Each worker operates independently, reducing the risk of resource contention or interference between agents. • Multiple Agents Per Worker: This is feasible and can be efficient in terms of resource utilization, especially if the agents are lightweight. However, it requires careful resource management and monitoring to avoid performance issues, as all agents share the same worker's resources. In scenarios where you have a high number of agents and limited resources, running multiple agents per worker might be a practical choice. However, for simplicity and robustness, especially in distributed systems, the one-worker-per-agent model is often preferred. Let me know if you'd like further clarification or have additional questions!
i
So how would you run 30 concurrent agents? give me a short code example to explain. given how workers are deployed, it doesn't seem possible to run multiple workers concurrently
t
To run 30 concurrent agents, you would typically deploy 30 workers, each managing one agent. If you're running these workers on the same machine, you can launch multiple instances of the worker script. Here's a short example to illustrate:
Copy code
import subprocess

# Number of workers/agents to deploy
num_workers = 30

# Path to your worker script
worker_script = "path/to/your_worker_script.py"

# Launch multiple workers
processes = []
for i in range(num_workers):
    process = subprocess.Popen(["python", worker_script])
    processes.append(process)

# Optionally, wait for all processes to complete
for process in processes:
    process.wait()
This script launches 30 instances of your worker script, each running independently and managing its own agent. Each worker operates in its own process, ensuring isolation and concurrency. Let me know if you need further clarification or additional examples!
You're correct that in the example provided, the workers are launched as separate processes, and each process runs independently. However, these processes do run concurrently at the system level, as they are managed by the operating system's process scheduler. Each worker operates in its own process, allowing them to handle tasks simultaneously, provided there are sufficient system resources (CPU, memory, etc.). If you're looking for a single process to manage multiple agents concurrently, that would require a different approach, such as using asynchronous programming to manage multiple event loops or tasks within one process. However, this is not the typical or recommended approach for deploying LiveKit workers, as the framework is designed to run one agent per process for isolation and performance reasons. Let me know if you'd like further clarification or a different example!
i
why subprocess instead of multiprocessing? any preference?
t
The choice between
subprocess
and
multiprocessing
depends on your specific use case: • `subprocess`: This is used to spawn completely independent processes. It is ideal when you want to run separate Python scripts or external programs, as each process runs in its own Python interpreter and has its own memory space. This is why
subprocess
is often used for deploying multiple workers, as each worker operates independently. • `multiprocessing`: This is used to create multiple processes within the same Python program. It is better suited for parallelizing tasks within a single script, as it allows for shared memory and easier communication between processes. However, it may not be ideal for running completely independent worker scripts. In the context of deploying multiple LiveKit workers,
subprocess
is preferred because each worker is designed to run as an independent process with its own lifecycle, and
subprocess
provides a straightforward way to achieve this. Let me know if you need further clarification!
i
Ok so just to close this out - the recommended approach for running large distributed systems is to run multiple workers (each running one agent) per container/instance (at least 8gb of memory and 4 cores). Each type a container/instance is provisioned it creates X amount of subprocesses to deploy X amount of workers/agents. I imagine that i would need to calculate metrics related to each containers CPU and memory usage and provision new containers (with another X amount of workers/agents) once that limit is hit?
t
Yes, you've summarized the recommended approach for running large distributed systems very well. Here's a breakdown: • Multiple Workers Per Container/Instance: Each container or instance can run multiple workers (each managing one agent), depending on the available resources. For example, with 8GB of memory and 4 cores, you can run a certain number of workers efficiently. • Subprocesses for Workers: Each worker is launched as a subprocess within the container/instance, ensuring isolation and independent operation. • Resource Monitoring: You would monitor CPU and memory usage for each container/instance. When resource usage approaches the limit, you can provision additional containers/instances to maintain performance and handle increased demand. • Scaling: Autoscaling mechanisms (e.g., Kubernetes Horizontal Pod Autoscaler) can be used to automate the provisioning of new containers/instances based on resource metrics like CPU, memory, or the number of active agents. This approach ensures scalability, efficient resource utilization, and robust performance for large distributed systems. Let me know if you have further questions or need additional guidance!
i
Does livekit provide metrics on worker utilization?
t
Yes, LiveKit provides metrics on worker utilization. Specifically, it uses a "worker availability" system to track how busy workers are. This is configured using the
load_fnc
and
load_threshold
in the `WorkerOptions`: • `load_fnc`: This function returns a value between 0 and 1, indicating how busy a worker is. By default, it measures the CPU usage of the worker. • `load_threshold`: This is the load value (default is 0.75) at which a worker will stop accepting new jobs. These metrics are used by LiveKit to automatically distribute load across available workers, ensuring efficient load balancing. Additionally, LiveKit Agents provides detailed metrics for observability into performance and model usage. Let me know if you need further details or assistance! Sources: Deployment and scaling | LiveKit Docs | Capturing metrics | LiveKit Docs
i
Gotcha, so these metrics likely wouldn't be useful for me - for the distributed system we're talking about its likely better to monitor the overall average CPU of containers rather than workers
t
You're absolutely correct. For a distributed system with multiple containers, monitoring the overall average CPU and memory usage of the containers would provide a more holistic view of resource utilization. This approach allows you to make informed decisions about scaling and provisioning new containers to handle increased demand. Worker-level metrics like
load_fnc
can still be useful for fine-grained load balancing within a container, but for system-wide monitoring and scaling, container-level metrics are more relevant. Let me know if you'd like recommendations on tools or strategies for monitoring container-level metrics!
i
is it possible to deploy agents to a custom named room in automatic dispatch scenario? as in, without having to dispatch separately?
t
Unfortunately, I couldn't find specific information about deploying agents to a custom-named room in an automatic dispatch scenario without using explicit dispatch methods. This might depend on how your system is configured or the specific use case. If you can provide more details about your setup or requirements, I can assist further or escalate this to a specialist for more in-depth guidance.
i
Heres my concern with explicit dispatch - In a production setting, i dont fully understand the recommended deployment approach for scaling. if each worker/agent runs in a separate instance, how would you manage spikes in demand given that it takes time to provision new instances? i mainly ask because in an explicit dispatch scenario, you need the agent to be deployed before you can dispatch to it, which makes it difficult to handle queues of waiting requests (in a queue system, once a worker gets up an running it doesn't have the ability to inform the queue that its ready to take a job)
t
Your concern about handling spikes in demand and the challenges of explicit dispatch in a production setting is valid. Here are some considerations and strategies to address this: • Pre-Provisioning: To handle spikes, you can pre-provision a buffer of idle workers/agents that are ready to take on jobs immediately. This reduces the time needed to scale up during sudden demand. • Autoscaling: Use an autoscaling mechanism (e.g., Kubernetes Horizontal Pod Autoscaler) to monitor metrics like CPU, memory, or queue length and automatically provision new instances when thresholds are exceeded. While there is a provisioning delay, pre-provisioning can help bridge the gap. • Warm Pools: Maintain a pool of "warm" workers that are initialized but not yet assigned to a specific job. These workers can be quickly dispatched when needed. • Queue Management: Implement a queue system that tracks job requests and assigns them to workers as they become available. While explicit dispatch requires the agent to be deployed first, the queue can still manage the assignment process efficiently. • Hybrid Approach: Combine automatic and explicit dispatch. For example, use automatic dispatch for predictable workloads and explicit dispatch for specialized or high-priority tasks. These strategies can help mitigate the challenges of explicit dispatch and ensure your system can handle spikes in demand effectively. Let me know if you'd like further details or assistance with implementation!
i
is it possible to perform automatic dispatch with rooms that can be custom named?
t
Yes, it is possible to perform automatic dispatch with rooms that can be custom named. You can achieve this by using dispatch rules with custom room naming configurations. For example:
Copy code
{
  "rule": {
    "dispatchRuleIndividual": {
      "roomPrefix": "call-"
    }
  },
  "roomConfig": {
    "agents": [
      {
        "agentName": "inbound-agent",
        "metadata": "job dispatch metadata"
      }
    ]
  }
}
In this configuration: • `dispatchRuleIndividual`: Creates a new room for each caller, with the room name consisting of a prefix (e.g.,
call-
) followed by the caller's phone number and a random suffix. • `roomConfig`: Automatically dispatches an agent (e.g.,
inbound-agent
) to the custom-named room. This allows you to create rooms with custom names and automatically dispatch agents to handle calls in those rooms. Let me know if you need further assistance! Sources: SIP dispatch rule | LiveKit Docs | Accepting incoming calls | LiveKit Docs
i
does the agent have to be predeployed in your example? im wondering if theres a flow that i can use to deploy and dispatch an agent in the same process
t
In the example provided earlier, the agent does not necessarily have to be predeployed. Automatic dispatch can handle the deployment and assignment of agents when rooms are created. Here's how it works: • By default, agents are automatically dispatched when rooms are created, provided the
agent_name
field is not set in the
WorkerOptions
. This allows the system to deploy and dispatch agents in the same process. • If you set the
agent_name
field in
WorkerOptions
, automatic dispatch is disabled, and you would need to explicitly dispatch the agent via API. For your use case, you can avoid setting the
agent_name
field to enable automatic dispatch, which will deploy and assign agents to rooms as needed. Let me know if you'd like further clarification or assistance! Sources: Dispatching agents | LiveKit Docs | Making calls using SIP | LiveKit Docs
i
Ok so for example, how would you add a custom room name to this example of automatic dispatch: import logging from dotenv import load_dotenv from livekit import rtc from livekit.agents import ( AutoSubscribe, JobContext, JobProcess, WorkerOptions, cli, llm, metrics, ) from livekit.agents.pipeline import VoicePipelineAgent from livekit.plugins import deepgram, openai, silero load_dotenv() logger = logging.getLogger("voice-assistant") def prewarm(proc: JobProcess): proc.userdata["vad"] = silero.VAD.load() async def entrypoint(ctx: JobContext): initial_ctx = llm.ChatContext().append( role="system", text=( "You are a voice assistant created by LiveKit. Your interface with users will be voice. " "You should use short and concise responses, and avoiding usage of unpronouncable punctuation." ), ) logger.info(f"connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) # wait for the first participant to connect participant = await ctx.wait_for_participant() logger.info(f"starting voice assistant for participant {participant.identity}") dg_model = "nova-3-general" if participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP: # use a model optimized for telephony dg_model = "nova-2-phonecall" agent = VoicePipelineAgent( vad=ctx.proc.userdata["vad"], stt=deepgram.STT(model=dg_model), llm=openai.LLM(), tts=openai.TTS(), chat_ctx=initial_ctx, ) agent.start(ctx.room, participant) usage_collector = metrics.UsageCollector() @agent.on("metrics_collected") def _on_metrics_collected(mtrcs: metrics.AgentMetrics): metrics.log_metrics(mtrcs) usage_collector.collect(mtrcs) async def log_usage(): summary = usage_collector.get_summary() logger.info(f"Usage: ${summary}") ctx.add_shutdown_callback(log_usage) await agent.say("Hello there! How can I help you today?", allow_interruptions=False) if name == "__main__": cli.run_app( WorkerOptions( entrypoint_fnc=entrypoint, prewarm_fnc=prewarm, ), )
t
To add a custom room name in an automatic dispatch scenario, you can configure the room naming logic within the dispatch rules or modify the way rooms are created. However, in the provided example, the room name is determined by the context (
ctx.room.name
), which is typically set when the room is created automatically by LiveKit. If you want to enforce custom room names, you would need to configure the dispatch rules on the server side. For example, you can use a
dispatchRuleIndividual
with a
roomPrefix
to create rooms with a specific naming pattern:
Copy code
{
  "rule": {
    "dispatchRuleIndividual": {
      "roomPrefix": "custom-room-"
    }
  }
}
This configuration would create rooms with names starting with
custom-room-
, followed by a unique identifier. Alternatively, if you want to dynamically set the room name in your Python code, you would need to use explicit dispatch instead of automatic dispatch, as automatic dispatch does not provide direct control over room naming in the worker script. Let me know if you'd like further clarification or assistance!
i
ok but explicit dispatch means i have to dispatch the agent in a separate process than the process in which the agent is deployed right? whereas automatic dispatch happens in the same process as the agent deployment? that puts a pretty huge constraint on my architecture
t
You're absolutely correct. In explicit dispatch, the agent must be deployed first, and then it is dispatched to a room via a separate process or API call. This separation can indeed introduce architectural complexity, especially if you need to manage the lifecycle of agents and rooms across different processes. On the other hand, automatic dispatch simplifies this by handling both the deployment and dispatch of agents within the same process. This makes it easier to manage and scale, as the system automatically assigns agents to rooms as they are created. If your architecture requires tight coupling between agent deployment and dispatch, automatic dispatch is likely the better fit. However, it does come with limitations, such as less control over room naming and agent assignment. If these constraints are significant, you may need to explore hybrid approaches or custom solutions to balance flexibility and simplicity. Let me know if you'd like to discuss potential workarounds or alternative strategies!