I have a custom Llm class built over base class in livekit I LiveKit Community #ask-ai

I have a custom Llm class built over base class in...

enough-sunset-14907

06/06/2025, 1:50 AM

I have a custom Llm class built over base class in livekit. Is it possible to setup streaming with TTS as soon as LLM starts producing text?

tall-airline-16756

06/06/2025, 1:50 AM

Thanks for reaching out, Viren. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/06/2025, 1:51 AM

Yes, it is possible to set up streaming with TTS (Text-to-Speech) as soon as your custom LLM starts producing text in LiveKit. LiveKit Agents provides a pipeline architecture with nodes for STT (Speech-to-Text), LLM (Language Learning Model), and TTS. Here's how you can achieve this: 1. Customizing the LLM Node: You can override the default LLM node to customize how the LLM processes and streams text. For example:

Copy code

python from livekit.agents import ModelSettings, llm, FunctionTool, Agent from typing import AsyncIterable async def llm_node( self, chat_ctx: llm.ChatContext, tools: list[FunctionTool], model_settings: ModelSettings ) -> AsyncIterable[llm.ChatChunk]: # Insert custom preprocessing here async for chunk in Agent.default.llm_node(self, chat_ctx, tools, model_settings): # Insert custom postprocessing here yield chunk

2. Streaming TTS Node: LiveKit provides a

StreamAdapter

class that enables streaming TTS even if the underlying TTS implementation doesn’t natively support it. This adapter uses a sentence tokenizer to split text for incremental synthesis. For example:

Copy code

python from livekit.agents import ModelSettings, rtc, Agent from typing import AsyncIterable async def tts_node( self, text: AsyncIterable[str], model_settings: ModelSettings ) -> AsyncIterable[rtc.AudioFrame]: # Insert custom text processing here async for frame in Agent.default.tts_node(self, text, model_settings): # Insert custom audio processing here yield frame

The

StreamAdapter

class handles the connection between the LLM output and TTS by: - Breaking text into sentences using a sentence tokenizer - Synthesizing each sentence individually - Streaming the audio frames as they become available This setup allows for real-time streaming of both the LLM's text output and the corresponding synthesized speech, creating a seamless and responsive experience. Sources: Voice pipeline nodes | LiveKit Docs | Tree View: | dse-workbench

enough-sunset-14907

06/06/2025, 1:54 AM

I'm actually using GroqLLM as base class and write my custom logic inside. The problem is the LLM first creates all the text and that's returned to TTS through AgentSession pipeline. I want it to be simultaneous with TTS

tall-airline-16756

06/06/2025, 1:54 AM

I'm sorry, but I don't have the information you need at the moment. If you would like a member of the LiveKit team to respond, please ask your question in one of the other channels.

Open in Slack

Previous Next