broad-lamp-30178
09/11/2025, 4:17 PMtall-airline-16756
09/11/2025, 4:17 PMtall-airline-16756
09/11/2025, 4:18 PMMultimodalAgent
class along with a model like RealtimeModel
, where you specify both "audio" and "text" in the modalities
parameter. This setup allows your agent to process voice input through real-time transcriptions and respond using both voice and text.
The agent keeps an eye on two text stream topics:
• `lk.transcription`: This is for real-time voice-to-text transcriptions.
• `lk.chat`: This handles direct text messages from participants.
When text input comes in on lk.chat
, it will gently interrupt the agent's current speech so it can process and respond to the message promptly. If you'd like, you can customize this behavior using RoomInputOptions
and RoomOutputOptions
to enable or disable transcription or text input/output based on your needs.
On the frontend side, you can use the registerTextStreamHandler
method to easily distinguish between transcriptions and direct chat messages. To send text input to the agent, simply use the sendText
method with the lk.chat
topic.
If your sessions involve users switching between voice and text, you can smoothly manage audio input and output by using session.input.set_audio_enabled()
and session.output.set_audio_enabled()
. This ensures a seamless transition between the two modes.
If you'd like to dive deeper, feel free to check out the Building voice agents guide. It has lots of helpful details!broad-lamp-30178
09/11/2025, 5:16 PMtall-airline-16756
09/11/2025, 5:17 PM