does tts start to generate after llm has completed the respo LiveKit Community #ask-ai

does tts start to generate after llm has completed...

busy-monitor-72069

09/14/2025, 10:21 PM

does tts start to generate after llm has completed the response or it by default starts generating from the partial llm output

tall-airline-16756

09/14/2025, 10:22 PM

Thanks for reaching out, Harsh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/14/2025, 10:22 PM

Just to share a little insight, TTS in LiveKit begins generating audio from the partial output of the LLM instead of waiting for the entire response to finish. This is done by streaming the text output from the LLM to the TTS system bit by bit, so the TTS can start processing and creating audio as soon as chunks of text are ready. I hope this helps clarify things for you! Sources: livekit-agents/livekit/agents/voice/agent_activity.py | Tool definition and use | LiveKit Docs

busy-monitor-72069

09/14/2025, 10:26 PM

what are all the ways we can optimise further for latency

tall-airline-16756

09/14/2025, 10:26 PM

I'm glad you're looking to optimize TTS latency in LiveKit! Here are some helpful strategies you might find useful: • Use Chunked Processing and Streaming: This allows incremental text processing so audio synthesis can start sooner. • Enable WebSocket Streaming: Using WebSocket-based streaming helps deliver audio in real-time, making things feel smoother. • Optimize Audio Format: Trying lower sample rates like 22050 Hz can really cut down processing time. • Adjust TTS Provider Settings: For instance, turning on

reduce_latency

in Rime or

autoMode

in ElevenLabs can speed up synthesis. • Choose Low-Latency Providers: Picking TTS services designed for minimal delay can make a big difference. • Ensure Network Proximity: Deploying agents closer to TTS service endpoints helps reduce transmission delays. • Monitor Latency Metrics: Keep an eye on metrics like Time to First Byte (TTFB) and Time to First Token (TTFT) to track improvements. Implementing these steps should really help enhance the responsiveness of your TTS applications. If you have any questions or need further guidance, feel free to reach out—I'm here to help! Sources: tests/test_agent_session.py | Rime integration guide | LiveKit Docs | Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Voice Assistants | livekit/livekit_composite | DeepWiki | Rime integration guide | LiveKit Docs

busy-monitor-72069

09/14/2025, 10:27 PM

how can i optimise the llm and stt latency

tall-airline-16756

09/14/2025, 10:28 PM

I’d love to help you optimize LLM and STT latency in LiveKit! Here are some friendly tips that might make a big difference: • Enable Streaming: Try using streaming STT APIs with small audio chunks (like 20ms) and stream tokens from the LLM as they come in. This can really speed things up. • Choose Low-Latency Providers: Providers such as Google Cloud STT or Groq are great options for faster processing times. • Enable Preemptive Generation: The

preemptive_generation

feature is super handy—it lets the LLM start processing before the user even finishes speaking. • Ensure Network Proximity: Placing your agents close to the LLM and STT service endpoints helps cut down on transmission delays. • Monitor Latency Metrics: LiveKit’s metrics API is a useful tool to keep an eye on Time to First Token (TTFT) and Time to First Byte (TTFB), so you can fine-tune performance. • Self-Host Models: Hosting smaller, faster models like Whisper yourself can offer more predictable and lower latency. These steps can really help reduce latency in your workflows. If you’d like, I’m here to guide you through implementing any of these strategies—just let me know! Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Logs, metrics, and telemetry | LiveKit Docs | Live Conversations with AI using ChatGPT and WebRTC | Live Conversations with AI using ChatGPT and WebRTC | README.md

Open in Slack

Previous Next