glamorous-kitchen-68628
08/14/2025, 5:57 AMtall-airline-16756
08/14/2025, 5:57 AMtall-airline-16756
08/14/2025, 5:57 AMAgentSession
with turn_detection="manual"
. This lets you control exactly when the audio input turns on and off, so you know when a user starts or ends their turn.
2. Register RPC Methods: Next, use register_rpc_method
to manage those PTT events. The key methods you’ll want are:
◦ `start_turn`: This kicks things off by enabling audio input and pausing any ongoing processes.
◦ `end_turn`: When the user’s done talking, this disables audio input, saves their turn, and triggers a response.
◦ `cancel_turn`: If needed, this cancels the current turn and clears any input.
1. Initialize Audio Input State: It’s a good idea to disable audio input by default when the session starts. That way, you avoid any accidental recording.
2. Implement the Agent: Build your Agent
with components like STT, LLM, and TTS, then plug it into your session.
3. Handle User Turns: Use callbacks to smoothly manage user audio input and handle cases where turns might be empty.
Want to see it in action? Check out this example implementation. And if you want to dive deeper, the documentation on manual turn detection has some great info too.glamorous-kitchen-68628
08/14/2025, 7:24 AMtall-airline-16756
08/14/2025, 7:25 AMclass MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a helpful assistant."
)
1. Configure Session Options: Next, use RoomInputOptions
and RoomOutputOptions
to turn off audio and enable text communication only:
session = AgentSession(
llm=openai.LLM(model="gpt-4o-mini"),
# No TTS or STT needed here
)
await session.start(
agent=MyAgent(),
room=ctx.room,
room_input_options=RoomInputOptions(text_enabled=True, audio_enabled=False),
room_output_options=RoomOutputOptions(transcription_enabled=True, audio_enabled=False),
)
1. Handle Text Input and Output: Your agent will get text input through the lk.chat
topic and send replies via the lk.transcription
topic. You can use the generate_reply
method to process what users say:
session.generate_reply(user_input="user's input text")
1. Run the Agent: Finally, run your agent with this text-only setup. It’ll automatically manage text input and output without any audio:
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
If you want to see everything in action, check out the text_only.py example in the LiveKit Agents repo.
This way, you’re all set with audio disabled, and your agent will smoothly handle text inputs and responses. Hope this helps!
Sources: livekit-agents/livekit/agents/voice/room_io/room_io.py | Text and transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docsglamorous-kitchen-68628
08/14/2025, 7:28 AMtall-airline-16756
08/14/2025, 7:29 AMaudio_enabled=False
for both input and output to go into text-only mode.
◦ Set audio_enabled=True
to jump back to voice mode.
1. Session Configuration: Start your session with both text and audio enabled, and then adjust the modes on the fly using the methods above.
If you want to dive deeper, check out the LiveKit Agents documentation. It’s got all the details you might need!
Sources: Text and transcriptions | LiveKit Docs | README.md | Text and transcriptions | LiveKit Docsglamorous-kitchen-68628
08/14/2025, 7:56 AMUser (Say): What if I want to say "Hello" to a Japanese people?
Tutor (Say): こんにちは!
Tutor (Text): You also can use some more casual ways such as "おっす!" or "やあ!"
In this example, the Tutor send 2 messages, first one will be TTS-ed, and the 2nd one does not, it will be sent to the client just plain text.tall-airline-16756
08/14/2025, 7:56 AMlk.chat
text stream topic. This skips the TTS processing and delivers the text straight to your client.
2. Configure AgentSession: Make sure your session is set up to handle both text and audio. For example:
session = AgentSession(
llm=your_llm,
room_input_options=RoomInputOptions(text_enabled=True),
room_output_options=RoomOutputOptions(audio_enabled=True, transcription_enabled=True),
)
1. Send Text and TTS Responses: You can create both TTS and plain text responses, then send them separately like this:
# Generate TTS response
tts_response = agent.generate_tts_response()
# Generate plain text response
plain_text = "This is a plain text response."
# Send TTS response
await session.send_audio(tts_response)
# Send plain text response
await session.send_text(plain_text)
1. Client-Side Handling: On the client side, just handle TTS responses via the lk.transcription
stream and plain text responses via the lk.chat
stream like this:
// Handle TTS responses
room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {
const message = await reader.readAll();
console.log(`TTS response with transcription: ${message}`);
});
// Handle plain text responses
room.registerTextStreamHandler('lk.chat', async (reader, participantInfo) => {
const message = await reader.readAll();
console.log(`Plain text response: ${message}`);
});
This way, you get the best of both worlds—TTS for spoken content and plain text for any extra instructions or info. If you want to chat more or need a hand with anything else, just let me know!glamorous-kitchen-68628
08/14/2025, 9:21 AMtall-airline-16756
08/14/2025, 9:22 AMglamorous-kitchen-68628
08/14/2025, 9:24 AMtall-airline-16756
08/14/2025, 9:24 AMsession = AgentSession(
llm=openai.LLM(model="gpt-4o-mini"),
room_input_options=RoomInputOptions(
text_enabled=True,
audio_enabled=False
),
room_output_options=RoomOutputOptions(
transcription_enabled=True,
audio_enabled=False
)
)
With this setup:
• The agent works purely in text mode, keeping an eye on the lk.chat
text stream for any incoming messages.
• It sends responses as text through the lk.transcription
stream.
• And you won’t need any audio dependencies like PortAudio at all.
To get the agent running in console mode, just use:
python myagent.py console
This is a great approach if you’re in an environment where installing audio dependencies isn’t an option. If you have any questions or need a hand with anything else, just let me know—I’m here to help!
Sources: livekit-agents/livekit/agents/voice/room_io/room_io.py | Text and transcriptions | LiveKit Docsglamorous-kitchen-68628
08/14/2025, 11:34 AMsend extra text to your client without using TTS, while still using TTS for other parts, by taking advantage of LiveKit's text streams feature. Here’s a simple way to do it:
1. Use Text Streams Directly: Just send plain text messages through thetext stream topic. This skips the TTS processing and delivers the text straight to your client.lk.chat
2. Configure AgentSession: Make sure your session is set up to handle both text and audio. For example:I still don't get how to use it. Suppose I have a code like this:
session = AgentSession(
llm=openai.LLM(model="gpt-4o-mini"),
stt=openai.STT(model="gpt-4o-transcribe", language="vi"),
tts=openai.TTS(model="gpt-4o-mini-tts", voice="alloy"),
turn_detection=MultilingualModel(),
vad=ctx.proc.userdata["vad"],
preemptive_generation=True,
)
await session.start(
agent=Assistant(),
room=ctx.room,
room_input_options=RoomInputOptions(
# LiveKit Cloud enhanced noise cancellation
# - If self-hosting, omit this parameter
# - For telephony applications, use `BVCTelephony` for best results
noise_cancellation=noise_cancellation.BVC(),
),
)
# Join the room and connect to the user
await ctx.connect()
How can I tweak the logic to make the send_audio
and send_text
later?
# Send TTS response
await session.send_audio(tts_response)
# Send plain text response
await session.send_text(plain_text)
tall-airline-16756
08/14/2025, 11:34 AMglamorous-kitchen-68628
08/14/2025, 12:07 PMroom.local_participant.publish_transcription
to send the transcription directly as text
to client?
If so, how can I call it?glamorous-kitchen-68628
08/14/2025, 12:11 PM