Initial delay for answering call is high as it creates a ses LiveKit Community #ask-ai

Join Slack

Initial delay for answering call is high as it cre...

# ask-ai

wooden-smartphone-55787

09/10/2025, 7:25 AM

Initial delay for answering call is high as it creates a session then agent joins in can we reduce it?

tall-airline-16756

09/10/2025, 7:25 AM

Thanks for reaching out, Test. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/10/2025, 7:25 AM

I completely understand how important it is to reduce that initial delay when answering a call and connecting an agent to a session. Here are some helpful strategies you can try: • Parallelize Agent Dispatch with Token Generation: You might want to generate a token with a longer expiration time right when the user logs in, or dispatch the agent while the user token is being generated. This way, both processes happen at the same time, speeding things up. • Avoid Explicit Room Creation: It’s often better to let the room be auto-created when the first participant joins. This approach helps reduce latency caused by cross-region synchronization, making things smoother. • Enable Preemptive Generation: Consider configuring the agent to start processing as soon as it receives user audio. This allows inference to overlap with ongoing audio, cutting down wait times. • Handle Cold Starts: Keeping warm instances of agents can really help avoid delays that happen when new instances are initialized. • Improve User Experience: Using consolidated status indicators and monitoring connection states can provide useful feedback to users during the connection process, making everything feel more seamless. I hope these tips help you optimize your LiveKit sessions and reduce delays. Please feel free to reach out if you’d like me to walk you through any of these steps in more detail—I’m here to help! Sources: Web and mobile frontends | LiveKit Docs | CreateRoom() is Taking Too Long | Agent speech and audio | LiveKit Docs | Deploying to production | LiveKit Docs

wooden-smartphone-55787

09/10/2025, 7:27 AM

we are using self hosted agents how we can improve this any code suggestions?

tall-airline-16756

09/10/2025, 7:28 AM

I'm glad you're looking to optimize your self-hosted agents! To help reduce that initial delay, here are some practical strategies and code-level tips you can try: • Optimize Infrastructure: A great way to avoid runtime delays is to pre-download models during the Docker build process. For example:

Copy code

RUN python -c "from silero import VAD; VAD.load()"

• Also, structuring your Dockerfiles to separate dependency installation from code changes can speed up rebuilds significantly. • Minimize Connection Delays: Consider generating long-lived tokens right at user login. This helps eliminate token-fetching delays. You can even dispatch agents while tokens are being generated, allowing connections to happen simultaneously. • Optimize Agent Initialization: Using the

prewarm

function to load static models just once per process can be a big help:

Copy code

def prewarm(proc):
    proc.userdata["vad"] = silero.VAD.load()

• And don’t forget to configure

num_idle_processes

to keep some pre-initialized agent processes ready to go:

Copy code

cli.run_app(
    WorkerOptions(
        entrypoint_fnc=entrypoint,
        prewarm_fnc=prewarm,
        num_idle_processes=2,
    )
)

• Optimize AI Pipeline: Enabling

preemptive_generation

lets your system start LLM/TTS inference as soon as partial transcripts come in, which really speeds things up:

Copy code

session = AgentSession(
    preemptive_generation=True,
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=cartesia.TTS()
)

• Processing audio in 50ms chunks is also a good practice for smooth streaming STT/TTS. • Monitor and Measure Performance: To keep an eye on how things are running, use

UsageCollector

to log metrics and spot any bottlenecks:

Copy code

usage_collector = metrics.UsageCollector()
@agent.on("metrics_collected")
def on_metrics_collected(agent_metrics):
    metrics.log_metrics(agent_metrics)

I hope these tips help you see a noticeable improvement! If you’d like, I’m here to assist you further with implementation or troubleshooting any specific issues you encounter. Just let me know! Sources: Builds and Dockerfiles | LiveKit Docs | Web and mobile frontends | LiveKit Docs | agent-examples/agent_examples/pipeline_agent.py | livekit-agents/livekit/agents/voice/agent_session.py | External data and RAG | LiveKit Docs

wooden-smartphone-55787

09/10/2025, 7:36 AM

preemptive_generation explain more with example

tall-airline-16756

09/10/2025, 7:36 AM

Preemptive generation in LiveKit is a wonderful feature that allows an AI agent to start crafting a response even before the user has fully finished speaking. This helps reduce the wait time you might feel and makes the conversation flow much more smoothly by using early transcription signals or partial user input. To get this up and running, you simply need to set the

preemptive_generation

parameter to

True

when you initialize the `AgentSession`:

Copy code

session = AgentSession(
    preemptive_generation=True,
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=cartesia.TTS()
)

Here’s a little overview of how it works: • The agent starts generating a response as soon as it gets a final transcript from the speech-to-text (STT) system, even if the user hasn’t fully finished their turn yet. • If anything changes in the context or parameters during the

on_user_turn_completed

callback, the agent will cancel the early response and create a new one based on the updated information. For instance, in testing, turning on preemptive generation helped cut down the total response time from 1.1 seconds to 0.8 seconds. That said, it might use a bit more computing power if the user interrupts or changes what they’re saying mid-turn, since some predictions might end up being discarded. It’s a good idea to use logging and metrics to see how this feature performs in your particular setup. If you’d like, I’m here to help with any questions or guide you through the implementation—just let me know! Sources: Agent speech and audio | LiveKit Docs | livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/realtime...

wooden-smartphone-55787

09/10/2025, 8:01 AM

for sip if we specify

Copy code

roomConfig: new RoomConfiguration({
    agents: [
      new RoomAgentDispatch({
        agentName: "inbound-agent",
        metadata: 'dispatch metadata',
      }),
    ],

will it improve any performance?

tall-airline-16756

09/10/2025, 8:02 AM

Using

RoomConfiguration

with

RoomAgentDispatch

can really help you gain better control and scalability for SIP integrations by allowing explicit agent assignment and cutting down on unnecessary resource use. That said, the impact on performance can vary depending on what you're aiming to improve. Are you hoping to speed up call setup, handle more agents smoothly, enhance media quality, or focus on something else? Just let me know, and I’ll be happy to guide you more specifically! Sources: Dispatching agents | LiveKit Docs | protobufs/livekit_sip.proto | Docs::Home | Authentication | LiveKit Docs | Agent dispatch | LiveKit Docs

2 Views

Open in Slack

Previous Next