Thank you for your question! From a latency perspective, the use of LLM streaming in LiveKit can still provide benefits, even if TTS waits for the full LLM stream to complete before outputting audio. Here are some key points:
•
Time to First Token (TTFT): LLM streaming allows for the generation of the first token earlier, which can reduce the perceived latency for the user.
•
Latency Optimization: By optimizing network proximity and focusing on metrics like TTFT and TTFB (Time to First Byte for TTS), the overall latency can be minimized.
•
Instant Mode: Features like Hume's "instant_mode" can provide ultra-low latency streaming for TTS, reducing the time to the first chunk of audio, though it may come at a slightly higher cost.
•
Total Latency Calculation: The total conversation latency is calculated as the sum of the end-of-utterance delay, TTFT, and TTFB. Streaming helps reduce the TTFT component, which is critical for faster response times.
While the total time for generating a full response may still be longer than a non-streaming call, the ability to start processing and generating tokens earlier can improve the overall user experience by reducing the perceived delay.
Sources:
How can I reduce latency in voice agents using STT, TTS and LLM? |
Hume TTS integration guide | LiveKit Docs |
Capturing metrics | LiveKit Docs |
Capturing metrics | LiveKit Docs