To reduce latency in your LiveKit calling agent, consider the following steps:
•
Optimize Network Proximity: Ensure your agent is close to the LLM, STT, and TTS services to minimize network latency.
•
Monitor Latency Metrics: Use the Agents metrics API to track key metrics like Time To First Token (TTFT) for LLM and Time To First Byte (TTFB) for TTS.
•
Provider Selection: Choose low-latency providers for LLM, STT, and TTS services. Use streaming APIs if available.
•
Enable Preemptive Generation: Start LLM and TTS processing before the user finishes speaking to reduce delays.
•
Optimize End-of-Utterance Detection: Fine-tune Voice Activity Detection (VAD) and endpointing delays to minimize post-speech latency.
•
Buffering and Parallel Processing: Use pre-connect buffering and process audio and text streams in parallel.
You can also test different configurations and providers to find the optimal setup for your use case. Let me know if you need further assistance with implementing these steps.
Sources:
Speech & Audio Processing | livekit/livekit_composite | DeepWiki |
livekit-agents/livekit/agents/voice/agent_session.py |
README.md |
Voice Assistants | livekit/livekit_composite | DeepWiki |
How can I reduce latency in voice agents using STT, TTS and LLM?