I want to implement a Push to Talk feature with livekit agen LiveKit Community #ask-ai

I want to implement a Push-to-Talk feature with li...

glamorous-kitchen-68628

08/14/2025, 5:57 AM

I want to implement a Push-to-Talk feature with livekit agents, how can I do it? Basically there is a button at the frontend, and the user will click on it to start talking. When they finished, click the button again so the agent can start responding.

tall-airline-16756

08/14/2025, 5:57 AM

Thanks for reaching out, Phuong. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/14/2025, 5:57 AM

Hey there! If you're looking to add a Push-to-Talk (PTT) feature in LiveKit, here’s a friendly step-by-step guide to help you out: 1. Set Up Manual Turn Detection: Start by configuring the

AgentSession

with

turn_detection="manual"

. This lets you control exactly when the audio input turns on and off, so you know when a user starts or ends their turn. 2. Register RPC Methods: Next, use

register_rpc_method

to manage those PTT events. The key methods you’ll want are: ◦ `start_turn`: This kicks things off by enabling audio input and pausing any ongoing processes. ◦ `end_turn`: When the user’s done talking, this disables audio input, saves their turn, and triggers a response. ◦ `cancel_turn`: If needed, this cancels the current turn and clears any input. 1. Initialize Audio Input State: It’s a good idea to disable audio input by default when the session starts. That way, you avoid any accidental recording. 2. Implement the Agent: Build your

Agent

with components like STT, LLM, and TTS, then plug it into your session. 3. Handle User Turns: Use callbacks to smoothly manage user audio input and handle cases where turns might be empty. Want to see it in action? Check out this example implementation. And if you want to dive deeper, the documentation on manual turn detection has some great info too.

glamorous-kitchen-68628

08/14/2025, 7:24 AM

If I want to use text-only mode (no SST, no TTS), how can I do in the agent.py implementation?

tall-airline-16756

08/14/2025, 7:25 AM

Hey there! If you want to set up a text-only mode for your LiveKit agent without using STT (Speech-to-Text) or TTS (Text-to-Speech), here’s a simple way to do it: 1. Create an Agent Class: Start by defining your agent to handle text-only interactions. Here’s an example:

Copy code

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful assistant."
        )

1. Configure Session Options: Next, use

RoomInputOptions

and

RoomOutputOptions

to turn off audio and enable text communication only:

Copy code

session = AgentSession(
    llm=openai.LLM(model="gpt-4o-mini"),
    # No TTS or STT needed here
)

await session.start(
    agent=MyAgent(),
    room=ctx.room,
    room_input_options=RoomInputOptions(text_enabled=True, audio_enabled=False),
    room_output_options=RoomOutputOptions(transcription_enabled=True, audio_enabled=False),
)

1. Handle Text Input and Output: Your agent will get text input through the

lk.chat

topic and send replies via the

lk.transcription

topic. You can use the

generate_reply

method to process what users say:

Copy code

session.generate_reply(user_input="user's input text")

1. Run the Agent: Finally, run your agent with this text-only setup. It’ll automatically manage text input and output without any audio:

Copy code

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

If you want to see everything in action, check out the text_only.py example in the LiveKit Agents repo. This way, you’re all set with audio disabled, and your agent will smoothly handle text inputs and responses. Hope this helps! Sources: livekit-agents/livekit/agents/voice/room_io/room_io.py | Text and transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docs

glamorous-kitchen-68628

08/14/2025, 7:28 AM

Can I switch between the voice mode and text-only mode during the same session? Ex: I want to have a switch on the frontend called "Stealth mode", when ON, no STT/TTS, and when OFF, the agent works normally with STT/TTS

tall-airline-16756

08/14/2025, 7:29 AM

Absolutely! You can easily switch between voice mode and text-only mode during the same session in LiveKit. Here’s a friendly guide on how to do it: 1. Programmatic Toggling: You can use these methods to turn audio input and output on or off whenever you want: ◦ `session.input.set_audio_enabled(True/False)`: This lets you toggle the audio input. ◦ `session.output.set_audio_enabled(True/False)`: This one toggles the audio output. 1. Frontend Integration: Think about adding a simple switch in your frontend—maybe call it "Stealth mode". When you flip it: ◦ Set

audio_enabled=False

for both input and output to go into text-only mode. ◦ Set

audio_enabled=True

to jump back to voice mode. 1. Session Configuration: Start your session with both text and audio enabled, and then adjust the modes on the fly using the methods above. If you want to dive deeper, check out the LiveKit Agents documentation. It’s got all the details you might need! Sources: Text and transcriptions | LiveKit Docs | README.md | Text and transcriptions | LiveKit Docs

glamorous-kitchen-68628

08/14/2025, 7:56 AM

What if I want the agent to send some extra text to client, but only "talk" a specific part? The use case here is I want to develop a Tutor agent, that generally talk, but sometime it will send some instruction by text a long side with its voice content. Ex:

Copy code

User (Say): What if I want to say "Hello" to a Japanese people?
Tutor (Say): こんにちは！
Tutor (Text): You also can use some more casual ways such as "おっす！" or "やあ！"

In this example, the Tutor send 2 messages, first one will be TTS-ed, and the 2nd one does not, it will be sent to the client just plain text.

tall-airline-16756

08/14/2025, 7:56 AM

Hey there! You can totally send extra text to your client without using TTS, while still using TTS for other parts, by taking advantage of LiveKit's text streams feature. Here’s a simple way to do it: 1. Use Text Streams Directly: Just send plain text messages through the

lk.chat

text stream topic. This skips the TTS processing and delivers the text straight to your client. 2. Configure AgentSession: Make sure your session is set up to handle both text and audio. For example:

Copy code

session = AgentSession(
    llm=your_llm,
    room_input_options=RoomInputOptions(text_enabled=True),
    room_output_options=RoomOutputOptions(audio_enabled=True, transcription_enabled=True),
)

1. Send Text and TTS Responses: You can create both TTS and plain text responses, then send them separately like this:

Copy code

# Generate TTS response
tts_response = agent.generate_tts_response()
# Generate plain text response
plain_text = "This is a plain text response."

# Send TTS response
await session.send_audio(tts_response)
# Send plain text response
await session.send_text(plain_text)

1. Client-Side Handling: On the client side, just handle TTS responses via the

lk.transcription

stream and plain text responses via the

lk.chat

stream like this:

Copy code

// Handle TTS responses
room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {
    const message = await reader.readAll();
    console.log(`TTS response with transcription: ${message}`);
});

// Handle plain text responses
room.registerTextStreamHandler('lk.chat', async (reader, participantInfo) => {
    const message = await reader.readAll();
    console.log(`Plain text response: ${message}`);
});

This way, you get the best of both worlds—TTS for spoken content and plain text for any extra instructions or info. If you want to chat more or need a hand with anything else, just let me know!

glamorous-kitchen-68628

08/14/2025, 9:21 AM

Should I install pytorch into my environment if I only use Text and Audio mode? (No vision mode)

tall-airline-16756

08/14/2025, 9:22 AM

Just a heads up—if you're using LiveKit with Text and Audio modes (without Vision), you don’t need to install PyTorch to get things running. That said, PyTorch might come in handy for some specific tasks, like converting models or handling more advanced integrations beyond the basic agent setup. If you have any questions about this, feel free to ask! Sources: requirements/requirements-convert_hf_to_gguf_update.txt | Azure OpenAI Realtime API and LiveKit | LiveKit Docs | OpenAI Realtime API integration guide | LiveKit Docs | Text and transcriptions | LiveKit Docs | livekit-agents/livekit/agents/voice/chat_cli.py

glamorous-kitchen-68628

08/14/2025, 9:24 AM

Is it possible to run the agent console mode and use "text-only"? When running the console, it asks for "PortAudio" which I don't have the permission to install on my environment.

tall-airline-16756

08/14/2025, 9:24 AM

Absolutely, you can run the LiveKit agent console mode in text-only mode without needing PortAudio. Just set up your session to disable audio and enable text-only communication like this:

Copy code

session = AgentSession(
    llm=openai.LLM(model="gpt-4o-mini"),
    room_input_options=RoomInputOptions(
        text_enabled=True,
        audio_enabled=False
    ),
    room_output_options=RoomOutputOptions(
        transcription_enabled=True,
        audio_enabled=False
    )
)

With this setup: • The agent works purely in text mode, keeping an eye on the

lk.chat

text stream for any incoming messages. • It sends responses as text through the

lk.transcription

stream. • And you won’t need any audio dependencies like PortAudio at all. To get the agent running in console mode, just use:

Copy code

python myagent.py console

This is a great approach if you’re in an environment where installing audio dependencies isn’t an option. If you have any questions or need a hand with anything else, just let me know—I’m here to help! Sources: livekit-agents/livekit/agents/voice/room_io/room_io.py | Text and transcriptions | LiveKit Docs

glamorous-kitchen-68628

08/14/2025, 11:34 AM

send extra text to your client without using TTS, while still using TTS for other parts, by taking advantage of LiveKit's text streams feature. Here’s a simple way to do it:

1. Use Text Streams Directly: Just send plain text messages through the
lk.chat
text stream topic. This skips the TTS processing and delivers the text straight to your client.

2. Configure AgentSession: Make sure your session is set up to handle both text and audio. For example:

I still don't get how to use it. Suppose I have a code like this:

Copy code

session = AgentSession(
        llm=openai.LLM(model="gpt-4o-mini"),
        stt=openai.STT(model="gpt-4o-transcribe", language="vi"),
        tts=openai.TTS(model="gpt-4o-mini-tts", voice="alloy"),
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
        preemptive_generation=True,
    )
    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            # LiveKit Cloud enhanced noise cancellation
            # - If self-hosting, omit this parameter
            # - For telephony applications, use `BVCTelephony` for best results
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    # Join the room and connect to the user
    await ctx.connect()

How can I tweak the logic to make the

send_audio

and

send_text

later?

Copy code

# Send TTS response
await session.send_audio(tts_response)
# Send plain text response
await session.send_text(plain_text)

tall-airline-16756

08/14/2025, 11:34 AM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

glamorous-kitchen-68628

08/14/2025, 12:07 PM

Can I use

room.local_participant.publish_transcription

to send the transcription directly as

text

to client? If so, how can I call it?

glamorous-kitchen-68628

08/14/2025, 12:11 PM

Can you help?

Open in Slack

Previous Next