How to have a different handling for voice message and text LiveKit Community #ask-ai

Join Slack

How to have a different handling for voice message...

# ask-ai

broad-lamp-30178

09/14/2025, 11:45 PM

How to have a different handling for voice message and text message? code examples of agent and client app?

tall-airline-16756

09/14/2025, 11:45 PM

Thanks for reaching out, Rongrong. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/14/2025, 11:46 PM

I'd love to better understand what you're hoping to achieve when handling voice and text messages differently. Could you share a bit more about your goals? For example, are you interested in: • How to detect and tell apart voice and text messages? • Ways to process or store these messages in different ways? • How to show them differently within a client application? Also, it would be helpful to know if you're looking for examples in a particular programming language or framework. I'm here to help you every step of the way!

broad-lamp-30178

09/14/2025, 11:48 PM

I would like to handle voice message by agent through the pipeline of stt-llm-tts, and text message by agent directly calling llm and return text. how to implement this?

tall-airline-16756

09/14/2025, 11:49 PM

I'm happy to guide you through handling voice messages using an STT-LLM-TTS pipeline and text messages by directly calling an LLM with the

VoicePipelineAgent

class from the LiveKit Agents framework. Let's walk through the steps together: 1. Initialize the VoicePipelineAgent First, you'll want to define your agent with STT, LLM, and TTS models. Here's a friendly example to get you started:

Copy code

# Example using Groq-based STT and LLM with OpenAI TTS
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),  # This is optional but really helpful for managing interruptions
    stt=openai.STT.with_groq(),
    llm=openai.LLM.with_groq(model="llama3-70b-8192"),
    tts=openai.TTS(),
    chat_ctx=initial_context,
    allow_interruptions=True  # This lets your users interrupt speech naturally
)

2. Handle Voice Messages When someone speaks, the agent will process their audio through the STT-LLM-TTS pipeline. You can start the agent in the room like this:

Copy code

agent.start(room, participant)

3. Handle Text Messages If you're working with text messages, you can skip the STT and TTS steps and just call the LLM directly, like so:

Copy code

await agent.say("Hello! How can I assist you?")

4. Customize Pipeline Behavior Feel free to tailor the pipeline to your needs by overriding nodes: • STT Node: You can pre-process or post-process the transcription to fit your use case. • LLM Node: Modify inputs or outputs before or after inference for more control. • TTS Node: Adjust the text before it’s synthesized to sound just right. 5. Additional Features • Utilize lifecycle hooks like

on_user_turn_completed

to manage context effectively. • Keep interruption handling smooth by enabling

allow_interruptions=True

broad-lamp-30178

09/14/2025, 11:52 PM

can we skip tts node at per message level?

tall-airline-16756

09/14/2025, 11:52 PM

Absolutely, you can skip the TTS node at a per-message level in LiveKit's

VoicePipelineAgent

. It’s actually pretty straightforward using the

on_user_turn_completed

lifecycle hook. Here’s a friendly guide on how you can do it: Python Example:

Copy code

async def on_user_turn_completed(
    self, turn_ctx: ChatContext, new_message: ChatMessage,
):
    if not new_message.text_content:
        raise StopResponse()  # Skips LLM and TTS for this message

Node.js Example:

Copy code

async onUserTurnCompleted(
    turnCtx: llm.ChatContext,
    newMessage: llm.ChatMessage,
): Promise<void> {
    if (!newMessage.textContent) {
        throw new voice.StopResponse();  // Skips LLM and TTS for this message
    }
}

This method gives you great control over when the TTS node runs, message by message. If anything feels unclear or you’d like more examples, please don’t hesitate to ask—I’m here to help! Sources: Docs::Agents | Pipeline nodes and hooks | LiveKit Docs

broad-lamp-30178

09/14/2025, 11:54 PM

but this example skips both LLM and TTS. can we skip TTS node only?

tall-airline-16756

09/14/2025, 11:54 PM

To help you skip just the TTS node in LiveKit's

VoicePipelineAgent

while still allowing the LLM to run, you can simply override the

tts_node

method in a custom agent class. Here’s a friendly example to guide you through it: Python Example:

Copy code

from livekit.agents import Agent, ModelSettings
from typing import AsyncIterable

class TTSSkippingAgent(Agent):
    async def tts_node(
        self, text: AsyncIterable[str], model_settings: ModelSettings
    ) -> None:
        # Skip TTS processing by returning None
        return None

This way, the LLM will continue to process and generate text as expected, but the TTS step will be skipped, so no audio will be created. Another option: If returning

None

doesn’t quite fit your setup, you might try returning an empty asynchronous generator instead, like this:

Copy code

async def tts_node(
    self, text: AsyncIterable[str], model_settings: ModelSettings
) -> AsyncIterable[rtc.AudioFrame]:
    async def empty_stream():
        return  # yield nothing
    return empty_stream()

broad-lamp-30178

09/23/2025, 3:10 AM

how to add a listener for

user_state_changed

tall-airline-16756

09/23/2025, 3:11 AM

I'd be happy to help you add a listener for

user_state_changed

in LiveKit! You can do this by using the

AgentSession

object and attaching an event listener. Here’s a simple way to implement it: Node.js Example:

Copy code

import { voice } from '@livekit/agents';

session.on(voice.AgentSessionEventTypes.UserStateChanged, (ev) => {
  if (ev.newState === 'speaking') {
    console.log('User started speaking');
  } else if (ev.newState === 'listening') {
    console.log('User stopped speaking');
  } else if (ev.newState === 'away') {
    console.log('User is not present (e.g. disconnected)');
  }
});

Python Example:

Copy code

from livekit.agents import UserStateChangedEvent, AgentStateChangedEvent

@session.on("user_state_changed")
def on_user_state_changed(ev: UserStateChangedEvent):
  if ev.new_state == "speaking":
    print("User started speaking")
  elif ev.new_state == "listening":
    print("User stopped speaking")
  elif ev.new_state == "away":
    print("User is not present (e.g. disconnected)")

User States: • speaking: VAD detected the user has started speaking. • listening: VAD detected the user has stopped speaking. • away: The user hasn’t responded for a while (default: 15 seconds). Also, if you'd like, you can customize the timeout for the "away" state using the

user_away_timeout

parameter in

AgentSession

. Please feel free to reach out if you have any questions or need more help—I’m here for you! Sources: Events and error handling | LiveKit Docs | livekit-agents/livekit/agents/voice/events.py

broad-lamp-30178

09/23/2025, 11:50 PM

how to disconnect from agent side?

tall-airline-16756

09/23/2025, 11:50 PM

Hi there! To gently disconnect an agent from the room in LiveKit, you can use the

ctx.shutdown()

method right within the agent's entrypoint function. This way, the agent will disconnect smoothly, and the other participants can continue their session without any interruptions. Python Example:

Copy code

async def entrypoint(ctx: JobContext):
    # Perform agent tasks...
    
    # Disconnect the agent from the room
    ctx.shutdown(reason="Session ended")

Node.js Example:

Copy code

export default defineAgent({
    entry: async (ctx: JobContext) => {
        // Perform agent tasks...

        // Disconnect the agent from the room
        ctx.shutdown('Session ended');
    },
});

Once you call

ctx.shutdown()

, any registered shutdown hooks will run to handle cleanup tasks like saving state or releasing resources. If you have any questions or need a hand with anything else, just let me know—I’m here to help! Sources: Inside a session | LiveKit Docs | Inside a session | LiveKit Docs | Job lifecycle | LiveKit Docs

Open in Slack

Previous Next