<@U084APK40NS> what is the best way to deploy 20 d...
# ask-ai
b
@millions-hair-24956 what is the best way to deploy 20 different voicepipelineagents with different requirements and system prompts
g
It seems likely that deploying multiple
VoicePipelineAgent
instances with different requirements and system prompts can be efficiently managed by using a single worker that can handle multiple configurations. According to LiveKit documentation, you can configure each
VoicePipelineAgent
individually when the request is received, allowing you to dynamically adjust the system prompts and other settings based on the incoming request. This approach leverages the flexibility of the
VoicePipelineAgent
to handle different configurations without needing separate deployments for each agent. Additionally, as per the LiveKit Slack discussion, a single worker can manage multiple agents by dynamically switching configurations based on the incoming user's attributes or other conditions. This method allows for efficient resource utilization and scalability, as you can autoscale the workers based on load, ensuring that your deployment can handle varying levels of concurrency and different agent configurations. Would you like more detailed guidance on setting up the configuration or managing the deployment process? Sources: - LiveKit Slack discussion on handling multiple agent configurations - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@millions-hair-24956 can you give me an example on how i would do this
g
My understanding is that you can manage multiple
VoicePipelineAgent
instances with different configurations by dynamically setting the system prompts and other parameters when a request is received. Here's a basic example in Python using the LiveKit SDK:
Copy code
from livekit.agents import VoicePipelineAgent
from livekit.plugins import silero, deepgram, openai

async def entrypoint(ctx):
    # Determine the configuration based on the request or participant attributes
    if ctx.room.metadata == "config1":
        system_prompt = "You are a helpful assistant."
        llm_model = "gpt-4o-mini"
    else:
        system_prompt = "You are a technical support agent."
        llm_model = "gpt-4o"

    # Create a chat context with the system prompt
    chat_ctx = llm.ChatContext().append(role="system", text=system_prompt)

    # Initialize the VoicePipelineAgent with the chosen configuration
    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model=llm_model),
        tts=openai.TTS(),
        chat_ctx=chat_ctx
    )

    # Start the agent in the room
    agent.start(ctx.room, ctx.participant)

# Run the worker with the entrypoint function
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
In this example, the
entrypoint
function checks the room metadata to decide which configuration to use. You can expand this logic to include more complex decision-making based on your specific requirements. Would you like more details on any specific part of this setup? Sources: - LiveKit Slack discussion on handling multiple agent configurations 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@millions-hair-24956 and this kind of logic could let me run 20 different agents each with large loads?
g
Yes, this approach allows you to run multiple agents with different configurations efficiently. By dynamically configuring each
VoicePipelineAgent
based on incoming requests, you can manage a large number of agents without needing separate deployments for each configuration. According to LiveKit documentation, LiveKit's architecture supports scaling by adding more worker instances as needed, which can handle increased loads. To manage large loads, you should consider the following: 1. Autoscaling: Implement autoscaling to dynamically adjust the number of worker instances based on the current load. This ensures that your system can handle peak loads without over-provisioning resources during off-peak times. 2. Resource Allocation: Ensure that each worker has sufficient CPU and memory resources to handle the expected number of concurrent agents. As per LiveKit's deployment guidelines, a 4-core, 8GB machine can handle around 25 voice agents, so plan your infrastructure accordingly. 3. Load Balancing: Utilize LiveKit's built-in load balancing to distribute requests evenly across available workers, ensuring optimal performance and resource utilization. Would you like more detailed guidance on implementing autoscaling or resource management? Sources: - LiveKit documentation on deployment and scaling 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@millions-hair-24956 give me a sample py script for this
g
Here's a sample Python script that demonstrates how to deploy multiple
VoicePipelineAgent
instances with different configurations using LiveKit. This script dynamically configures each agent based on room metadata or other attributes:
Copy code
import asyncio
from livekit.agents import VoicePipelineAgent, cli, WorkerOptions, AutoSubscribe
from livekit.plugins import silero, deepgram, openai
from livekit import rtc

async def entrypoint(ctx):
    # Determine the configuration based on the room metadata
    room_metadata = ctx.room.metadata

    if room_metadata == "config1":
        system_prompt = "You are a helpful assistant."
        llm_model = "gpt-4o-mini"
    else:
        system_prompt = "You are a technical support agent."
        llm_model = "gpt-4o"

    # Create a chat context with the system prompt
    chat_ctx = llm.ChatContext().append(role="system", text=system_prompt)

    # Initialize the VoicePipelineAgent with the chosen configuration
    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model=llm_model),
        tts=openai.TTS(),
        chat_ctx=chat_ctx
    )

    # Connect to the room and start the agent
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    agent.start(ctx.room, participant)

# Run the worker with the entrypoint function
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
This script uses the
entrypoint
function to determine the configuration for each
VoicePipelineAgent
based on the room's metadata. It then initializes and starts the agent in the specified room. You can expand the logic to include more configurations or decision criteria as needed. Would you like more details on any specific part of this setup? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1728516300.770279 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732415232.661559 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@millions-hair-24956 what if each agent uses different TTS STT and LLM with different functions would this go in the same py script or would this get branched out to multiple scripts
g
If each
VoicePipelineAgent
requires different TTS, STT, and LLM configurations, you can still manage this within a single Python script by dynamically selecting the appropriate components based on the incoming request or room metadata. This approach allows you to maintain a centralized deployment while handling diverse configurations. Here's how you can structure your script:
Copy code
import asyncio
from livekit.agents import VoicePipelineAgent, cli, WorkerOptions, AutoSubscribe
from livekit.plugins import silero, deepgram, openai, cartesia

async def entrypoint(ctx):
    # Determine the configuration based on the room metadata
    room_metadata = ctx.room.metadata

    if room_metadata == "config1":
        system_prompt = "You are a helpful assistant."
        stt = deepgram.STT(model="nova-2-general")
        llm = openai.LLM(model="gpt-4o-mini")
        tts = cartesia.TTS()
    elif room_metadata == "config2":
        system_prompt = "You are a technical support agent."
        stt = silero.STT()
        llm = openai.LLM(model="gpt-4o")
        tts = openai.TTS()
    else:
        system_prompt = "Default assistant."
        stt = deepgram.STT()
        llm = openai.LLM()
        tts = cartesia.TTS()

    # Create a chat context with the system prompt
    chat_ctx = llm.ChatContext().append(role="system", text=system_prompt)

    # Initialize the VoicePipelineAgent with the chosen configuration
    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=stt,
        llm=llm,
        tts=tts,
        chat_ctx=chat_ctx
    )

    # Connect to the room and start the agent
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    agent.start(ctx.room, participant)

# Run the worker with the entrypoint function
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
This script uses conditional logic to select different TTS, STT, and LLM components based on the room metadata. This approach keeps your deployment centralized and manageable, even with diverse agent configurations. Would you like more details on any specific part of this setup? Sources: - https://docs.livekit.io/python/livekit/agents/pipeline/pipeline_agent.html#args - https://docs.livekit.io/python/livekit/agents/pipeline/index.html#args 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.