My agent joins the room before the user. I want th...
# ask-ai
c
My agent joins the room before the user. I want the agent to say something when the user joins. I am currently using
agent.session.say(...)
, but I want to pre-generate the audio before the user joins, so the agent can just immediately stream the audio. currently, it must generate the audio after the user joins
t
Thanks for reaching out, Nathan. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want your agent to have audio ready before the user even joins the room, you can use the pre-connect audio feature in LiveKit. It’s a handy way for the agent to get everything set up and start streaming audio right away when the user hops in. Here’s a quick rundown on how to do it depending on your setup: • For Python Agents: Just enable pre-connect audio when you start the agent’s session. This will take care of capturing and prepping the audio ahead of time. • For Android or Swift: Try using the
PreConnectAudioBuffer
class to record and hold the audio. Then, you can send that buffered audio to the user as soon as they connect. If you’re working with a different SDK or want to dive deeper, just give me a shout—I’m here to help with whatever you need!
c
I am using the agents sdk. can you give me a minimal example in code
t
Here’s a simple example to help you get started with enabling and using pre-connect audio in the Agents SDK (Python):
Copy code
from livekit import RoomIO, RoomInputOptions

# Initialize RoomIO with pre-connect audio enabled
room_io = RoomIO(
    agent_session=agent_session,  # Your agent session instance
    room=room,  # The room instance
    input_options=RoomInputOptions(
        pre_connect_audio=True,  # Enable pre-connect audio
        pre_connect_audio_timeout=3.0  # Timeout in seconds
    )
)

# The audio will be automatically captured and sent to the user when they join.
This setup makes sure the agent can pre-generate audio, which will start streaming right away when the user connects. If you have any questions or want me to walk you through more examples, just let me know—I’m here to help!
c
how can I just generate audio in an
AsyncIterable[AudioFrame]
using the agents sdk
t
To generate audio in an
AsyncIterable[AudioFrame]
using the Agents SDK, you’ve got two friendly options: using the TTS pipeline node or jumping straight into the TTS stream. Here’s a simple example for each to get you started: Using the TTS Pipeline Node:
Copy code
from livekit.agents import Agent, ModelSettings

async def generate_audio(text_stream, model_settings):
    # Create an instance of Agent with TTS capabilities
    agent = Agent(...)  # Initialize with your TTS configuration

    # Use the default TTS node to generate audio
    async for frame in agent.tts_node(text_stream, model_settings):
        yield frame
Using TTS Stream Directly:
Copy code
from livekit.agents import Agent, ModelSettings

async def generate_audio_direct(text_stream, model_settings):
    # Get the TTS instance from the agent
    tts = agent.tts  # Assuming agent has a TTS configured

    # Create a TTS stream
    stream = tts.stream()

    try:
        # Push text into the stream
        async for text in text_stream:
            stream.push_text(text)
            stream.end_input()

            # Consume audio frames
            async for audio_event in stream:
                yield audio_event.frame
    finally:
        await stream.aclose()
If you want me to walk you through any part of this or need more examples, just give me a shout—I’m here to help!