I want to play an audio file before starting the agent sessi LiveKit Community #ask-ai

Join Slack

I want to play an audio file before starting the a...

# ask-ai

tall-autumn-57974

08/18/2025, 9:06 AM

I want to play an audio file before starting the agent session. How can I do that?

tall-airline-16756

08/18/2025, 9:06 AM

Thanks for reaching out, Pierrick. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/18/2025, 9:06 AM

Hey there! Just to make sure I’m on the right page—do you want the audio to start playing automatically as soon as the app launches, or should it play after a certain event but before the agent session kicks off? Also, are you using any particular tools or SDKs for this? I’m here to help! Sources: Agent speech and audio | LiveKit Docs

tall-autumn-57974

08/18/2025, 9:08 AM

What I want is to play some kind of welcome audio when the caller enters the room to inform them their call is gonna be handled by our voice bot. Then, once the audio has finished playing, I want the agent session to start as it normally would (the session also plays a background audio)

tall-airline-16756

08/18/2025, 9:08 AM

Hey there! If you want to play a welcome audio as soon as a caller joins the room and then kick off the agent session, here’s a friendly way to do it: • Use the
on_enter
Lifecycle Hook: This lets you play a warm welcome message right when the agent becomes active in the session. For example:

Copy code

from livekit.agents import Agent, AgentSession
from livekit.plugins import openai

class WelcomeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="A friendly agent that provides customer support.",
            tts=openai.TTS()
        )
    
    async def on_enter(self):
        self.session.say("Welcome! Your call will be handled by our voice bot.")

async def entrypoint(ctx):
    session = AgentSession()
    await session.start(
        agent=WelcomeAgent(),
        room=ctx.room
    )

• Use Events: Another cool option is to trigger the welcome audio with the

participant_joined

event whenever someone joins the room. • Background Audio: And after the welcome message, you can keep the vibe going by using

BackgroundAudioPlayer

to play some ambient sounds during the session. If you want me to walk you through any part of this or need a hand setting it up, just let me know—I’m here to help!

tall-autumn-57974

08/18/2025, 9:09 AM

What if I prefer playing a mp3 instead of using TTS

tall-airline-16756

08/18/2025, 9:10 AM

If you want to play an MP3 file instead of using TTS, you can do that by providing pre-synthesized audio to the

session.say()

method. This method takes an

audio

parameter of type

AsyncIterable[rtc.AudioFrame]

. Here’s a quick example for you:

Copy code

async def audio_generator():
    yield audio_frame  # audio_frame should be an rtc.AudioFrame object

await session.say("Message", audio=audio_generator())

That said, the details on how to convert an MP3 file into

rtc.AudioFrame

objects aren’t covered here. You’ll probably need to handle that conversion yourself before passing it in. If you want, I’m happy to help you figure that part out—just let me know! Sources: basics/playing_audio.py | Agent speech and audio | LiveKit Docs

tall-autumn-57974

08/18/2025, 9:15 AM

Yes, it looks like I can't do exactly what I need with your proposal. Some possible solutions I have in mind: • Use Twilio (my telephony provider) before starting the session and sending some audio frame. But I will need some code example to do that • Start a separate session which would just play audio to the room before starting the

AgentSession

tall-airline-16756

08/18/2025, 9:15 AM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

refined-appointment-81829

08/18/2025, 12:34 PM

Maybe this will help: You can publish audio stream to room if you like: https://github.com/livekit/python-sdks/blob/main/examples/play_audio_stream.py Or to play an audio file before starting the agent session in LiveKit, you can utilize the

BackgroundAudioPlayer

to play your welcome audio. You can do this by following these steps: 1. Create an instance of
BackgroundAudioPlayer
. 2. Play the welcome audio using the

play

method. 3. Wait for the audio to finish before starting the agent session. 4. Start the agent session as usual. Here's an example of how you can implement this in Python:

Copy code

from livekit.agents import BackgroundAudioPlayer, AudioConfig
import asyncio

async def main(ctx):
    # Create the background audio player
    background_audio = BackgroundAudioPlayer()

    # Start the background audio player
    await background_audio.start(room=ctx.room, agent_session=session)

    # Play the welcome audio
    play_handle = background_audio.play("/path/to/welcome-audio.mp3")

    # Wait for the audio to finish playing
    await play_handle

    # Now start the agent session
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(),
        turn_detection=turn_detector.MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Agent(instructions="You are a helpful voice AI assistant."),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    # Optionally, you can also play background audio during the session
    await background_audio.play("/path/to/background-audio.mp3", loop=True)

# Run the main function
asyncio.run(main(ctx))

Explanation: • Playing the Audio: The

play

method is called with the path to your welcome audio file (e.g., an MP3 file). This method returns a

PlayHandle

, which you can await to ensure the audio finishes playing before proceeding. • Starting the Agent Session: After the welcome audio has finished playing, you can start the agent session as you normally would. Publishing an Audio Track Directly If you prefer to publish an audio track directly to the room instead of using the

BackgroundAudioPlayer

, you can use the LiveKit Python SDK to create an audio track and publish it. Here’s a brief example:

Copy code

from livekit import Room, AudioTrack

async def publish_audio_track(room: Room):
    # Create an audio track from an audio file
    audio_track = AudioTrack("/path/to/welcome-audio.mp3")

    # Publish the audio track to the room
    await room.publish_track(audio_track)

# Call this function before starting the agent session
await publish_audio_track(ctx.room)

Notes: • Ensure that the audio file paths are correct and accessible. • You can use either TTS or pre-recorded audio files (like MP3s) based on your preference. The above examples demonstrate how to use an MP3 file for the welcome message. This approach allows you to provide a seamless experience for users entering the room while ensuring that the agent session starts only after the welcome audio has been played.

tall-autumn-57974

08/18/2025, 2:09 PM

Hi @refined-appointment-81829 I tested both your proposals but none of them seem to work 1.

BackgroundAudioPlayer

needs a

session

object to be able to run

Copy code

Exception: cannot access local participant before connecting {"pid": 3323, "job_id": "AJ_G8VBQnBxBteF"}

However, if I run this code after starting the session, it will be in competition with my call background audio and won't be played. The agent will start speaking instantly 2. For the audio track direct publish, I found the method

room.local_participant.publish_track

however I'm not sure to understand how I can transform a remote mp3 url (from s3 for instance) into an

AudioTrack

object 🤔 Semms like

AudioTrack

takes an

AudioSource

but none of them seem to have a param

audio_file_path

audio_file_url

refined-appointment-81829

08/18/2025, 2:36 PM

you will need to transcode, frame and send the audio based on your needs.

tall-autumn-57974

08/18/2025, 3:48 PM

Do you have any example to point me to? I've had a look at many examples but I can't find anything regarding the publishing of a mp3 track to the room

refined-appointment-81829

08/18/2025, 3:49 PM

This is the most similar example I can think of. Would be adapted to your use case: https://github.com/livekit/python-sdks/blob/main/examples/play_audio_stream.py

tall-autumn-57974

08/18/2025, 3:50 PM

Also I'm still stuck with

room.local_participant.publish_track

which triggers an error when executed before

agent_session.start()

Copy code

Exception: cannot access local participant before connecting {"pid": 5435, "job_id": "AJ_qXUY62tyzsXk"}

tall-autumn-57974

08/18/2025, 3:51 PM

I managed to do something quite convoluted for such a simple task. It works if I execute

await publish_audio_track(ctx.room)

after session start, but then the audio is played at the same time as the bot speaking...

Untitled

refined-appointment-81829

08/18/2025, 3:55 PM

why are you having the bot speak in that case?

tall-autumn-57974

08/18/2025, 3:57 PM

It's just that the session is started and the bot has a greeting instruction on enter

refined-appointment-81829

08/18/2025, 3:57 PM

is that what you want?

tall-autumn-57974

08/18/2025, 3:58 PM

My usecase is the following: my clients want to play a "pre-connect audio" in order to inform the caller the call is gonna be handled by an AI. This message is stylized with the client's audio music and their own voice. They provide a mp3 file. Once the message is over, the bot would start speaking, greeting the caller and starting the booking flow

tall-autumn-57974

08/18/2025, 3:59 PM

I'm using Twilio as a telephony provider, but with SIP Trunking I'm not sure I can send TwiML data

refined-appointment-81829

08/18/2025, 3:59 PM

So why not trigger the greeting after the audio is done?

tall-autumn-57974

08/18/2025, 4:00 PM

Well, yes indeed, I'll see what I can do. But I'm quite surprised I need all this code (buffering etc) to publish an audio track to the room 🤔

👍 1

tall-autumn-57974

08/18/2025, 4:07 PM

From what I see,

BackgroundAudioPlayer

is not able to play a remote file directly, right?

refined-appointment-81829

08/18/2025, 4:08 PM

I don't think it plays remote file. You would pull it locally to spool or tmp and play there.

tall-autumn-57974

08/18/2025, 4:09 PM

Alright. Well, doing this works

Copy code

await publish_audio_track(ctx.room)
    # Start background audio
    await userdata.audio_player.start(room=ctx.room, agent_session=session)

    await session.generate_reply()

tall-autumn-57974

08/18/2025, 4:09 PM

But, if I speak while the audio is played, the agent answers 😅

tall-autumn-57974

08/18/2025, 4:09 PM

I will have to mute then unmute the user...

tall-autumn-57974

08/18/2025, 4:10 PM

Any plans on your side to add some pre-connect audio feature ?

refined-appointment-81829

08/18/2025, 4:10 PM

I've not really heard anyone request it before. We can look into it.

refined-appointment-81829

08/18/2025, 4:11 PM

Can you do your own custom turn detection in that case so agent does not respond. Or temporarily disable audio input during that phase of the call.

tall-autumn-57974

08/18/2025, 4:13 PM

I'll see what I can do about it, but it seems quite some work for a simple feature. I might start way simpler and use TTS to generate the sentence.

refined-appointment-81829

08/18/2025, 4:14 PM

Good luck.

🙏 1

Open in Slack

Previous Next