faint-insurance-26243
09/11/2025, 12:17 PMtall-airline-16756
09/11/2025, 12:17 PMfaint-insurance-26243
09/11/2025, 12:19 PMtall-airline-16756
09/11/2025, 12:19 PMAgentSession
with a TTS instance, LiveKit takes care of streaming the LLM's text output to the TTS engine in real time. It’s straightforward and requires very little setup.
• Standalone TTS Streaming: If you’re looking for more hands-on control, you can manually stream LLM responses to a TTS engine. This way, you get to manage how text chunks are sent and have full control over the synthesis process.
If you’d like me to share some detailed examples or if you have any questions along the way, just let me know—I’m here to help!faint-insurance-26243
09/11/2025, 1:47 PMtall-airline-16756
09/11/2025, 1:48 PMAgentSession
for an easy, automatic setup, or going the manual route if you want more control. Here are examples for both to guide you through:
• Using AgentSession
(Recommended): This is the simplest way, as it automatically streams LLM responses to TTS:
from livekit.agents import AgentSession
from livekit.plugins import openai, silero
# Initialize the session
session = AgentSession(
vad=silero.VAD.load(),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(model="tts-1")
)
# Start the session in a room
await session.start(agent=YourAgent(), room=ctx.room)
With this setup, the LLM’s output is seamlessly streamed to the TTS engine, and the synthesized audio reaches all participants in the room. It’s really convenient!
• Manual TTS Streaming (For Custom Control): If you want to tailor the streaming process yourself, here’s how you can manually stream text to TTS and publish the audio:
from livekit import rtc
from livekit.plugins import cartesia
# Initialize TTS and audio source
tts = cartesia.TTS(model="sonic-english")
audio_source = rtc.AudioSource(tts.sample_rate, tts.num_channels)
track = rtc.LocalAudioTrack.create_audio_track("agent-audio", audio_source)
await ctx.room.local_participant.publish_track(track)
# Stream text to TTS
tts_stream = tts.stream()
async def send_audio():
async for audio in tts_stream:
await audio_source.capture_frame(audio.frame)
ctx.create_task(send_audio())
# Push text chunks to TTS
for chunk in ["Hello ", "from LiveKit. ", "How can I help you?"]:
tts_stream.push_text(chunk)
tts_stream.end_input()
This method gives you full control over the text flow and lets you add any custom logic you might need.
If you’d like, I’m here to help with a full working example or to walk you through any part of this. Just let me know!tall-airline-16756
09/11/2025, 1:48 PMAgentSession
for an easy, automatic setup, or manually controlling the TTS stream if you want more customization. Let me walk you through both approaches:
• Using AgentSession
(Recommended):
This is the simplest way, as it automatically streams LLM responses to TTS for you:
`from livekit.agents import AgentSession
from livekit.plugins import openai, silero
# Initialize the session
session = AgentSession(
vad=silero.VAD.load(),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(model="tts-1")
)
# Start the session in a room
await session.start(agent=YourAgent(), room=ctx.room)`With this setup, the LLM output is seamlessly streamed to the TTS engine, and the synthesized audio is delivered to everyone in the room.
• Manual TTS Streaming (For Custom Control):
If you’d like more control over the process, you can manually stream text to the TTS and publish the audio yourself:
`from livekit import rtc
from livekit.plugins import cartesia
# Initialize TTS and audio source
tts = cartesia.TTS(model="sonic-english")
audio_source = rtc.AudioSource(tts.sample_rate, tts.num_channels)
track = rtc.LocalAudioTrack.create_audio_track("agent-audio", audio_source)
await ctx.room.local_participant.publish_track(track)
# Stream text to TTS
tts_stream = tts.stream()
async def send_audio():
async for audio in tts_stream:
await audio_source.capture_frame(audio.frame)
ctx.create_task(send_audio())
# Push text chunks to TTS
for chunk in ["Hello ", "from LiveKit. ", "How can I help you?"]:
tts_stream.push_text(chunk)
tts_stream.end_input()`This approach gives you the flexibility to control the text flow and add any custom logic you might need.
If you have any questions or would like me to provide a complete working example, just let me know—I’m here to help!