I have a custom Llm class built over base class in...
# ask-ai
e
I have a custom Llm class built over base class in livekit. Is it possible to setup streaming with TTS as soon as LLM starts producing text?
t
Thanks for reaching out, Viren. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Yes, it is possible to set up streaming with TTS (Text-to-Speech) as soon as your custom LLM starts producing text in LiveKit. LiveKit Agents provides a pipeline architecture with nodes for STT (Speech-to-Text), LLM (Language Learning Model), and TTS. Here's how you can achieve this: 1. Customizing the LLM Node: You can override the default LLM node to customize how the LLM processes and streams text. For example:
Copy code
python from livekit.agents import ModelSettings, llm, FunctionTool, Agent from typing import AsyncIterable async def llm_node( self, chat_ctx: llm.ChatContext, tools: list[FunctionTool], model_settings: ModelSettings ) -> AsyncIterable[llm.ChatChunk]: # Insert custom preprocessing here async for chunk in Agent.default.llm_node(self, chat_ctx, tools, model_settings): # Insert custom postprocessing here yield chunk
2. Streaming TTS Node: LiveKit provides a
StreamAdapter
class that enables streaming TTS even if the underlying TTS implementation doesn’t natively support it. This adapter uses a sentence tokenizer to split text for incremental synthesis. For example:
Copy code
python from livekit.agents import ModelSettings, rtc, Agent from typing import AsyncIterable async def tts_node( self, text: AsyncIterable[str], model_settings: ModelSettings ) -> AsyncIterable[rtc.AudioFrame]: # Insert custom text processing here async for frame in Agent.default.tts_node(self, text, model_settings): # Insert custom audio processing here yield frame
The
StreamAdapter
class handles the connection between the LLM output and TTS by: - Breaking text into sentences using a sentence tokenizer - Synthesizing each sentence individually - Streaming the audio frames as they become available This setup allows for real-time streaming of both the LLM's text output and the corresponding synthesized speech, creating a seamless and responsive experience. Sources: Voice pipeline nodes | LiveKit Docs | Tree View: | dse-workbench
e
I'm actually using GroqLLM as base class and write my custom logic inside. The problem is the LLM first creates all the text and that's returned to TTS through AgentSession pipeline. I want it to be simultaneous with TTS
t
I'm sorry, but I don't have the information you need at the moment. If you would like a member of the LiveKit team to respond, please ask your question in one of the other channels.