I want to play an audio file before starting the a...
# ask-ai
t
I want to play an audio file before starting the agent session. How can I do that?
t
Thanks for reaching out, Pierrick. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! Just to make sure I’m on the right page—do you want the audio to start playing automatically as soon as the app launches, or should it play after a certain event but before the agent session kicks off? Also, are you using any particular tools or SDKs for this? I’m here to help! Sources: Agent speech and audio | LiveKit Docs
t
What I want is to play some kind of welcome audio when the caller enters the room to inform them their call is gonna be handled by our voice bot. Then, once the audio has finished playing, I want the agent session to start as it normally would (the session also plays a background audio)
t
Hey there! If you want to play a welcome audio as soon as a caller joins the room and then kick off the agent session, here’s a friendly way to do it: • Use the
on_enter
Lifecycle Hook:
This lets you play a warm welcome message right when the agent becomes active in the session. For example:
Copy code
from livekit.agents import Agent, AgentSession
from livekit.plugins import openai

class WelcomeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="A friendly agent that provides customer support.",
            tts=openai.TTS()
        )
    
    async def on_enter(self):
        self.session.say("Welcome! Your call will be handled by our voice bot.")

async def entrypoint(ctx):
    session = AgentSession()
    await session.start(
        agent=WelcomeAgent(),
        room=ctx.room
    )
Use Events: Another cool option is to trigger the welcome audio with the
participant_joined
event whenever someone joins the room. • Background Audio: And after the welcome message, you can keep the vibe going by using
BackgroundAudioPlayer
to play some ambient sounds during the session. If you want me to walk you through any part of this or need a hand setting it up, just let me know—I’m here to help!
t
What if I prefer playing a mp3 instead of using TTS
t
If you want to play an MP3 file instead of using TTS, you can do that by providing pre-synthesized audio to the
session.say()
method. This method takes an
audio
parameter of type
AsyncIterable[rtc.AudioFrame]
. Here’s a quick example for you:
Copy code
async def audio_generator():
    yield audio_frame  # audio_frame should be an rtc.AudioFrame object

await session.say("Message", audio=audio_generator())
That said, the details on how to convert an MP3 file into
rtc.AudioFrame
objects aren’t covered here. You’ll probably need to handle that conversion yourself before passing it in. If you want, I’m happy to help you figure that part out—just let me know! Sources: basics/playing_audio.py | Agent speech and audio | LiveKit Docs
t
Yes, it looks like I can't do exactly what I need with your proposal. Some possible solutions I have in mind: • Use Twilio (my telephony provider) before starting the session and sending some audio frame. But I will need some code example to do that • Start a separate session which would just play audio to the room before starting the
AgentSession
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
r
Maybe this will help: You can publish audio stream to room if you like: https://github.com/livekit/python-sdks/blob/main/examples/play_audio_stream.py Or to play an audio file before starting the agent session in LiveKit, you can utilize the
BackgroundAudioPlayer
to play your welcome audio. You can do this by following these steps: 1. Create an instance of
BackgroundAudioPlayer
. 2. Play the welcome audio using the
play
method. 3. Wait for the audio to finish before starting the agent session. 4. Start the agent session as usual. Here's an example of how you can implement this in Python:
Copy code
from livekit.agents import BackgroundAudioPlayer, AudioConfig
import asyncio

async def main(ctx):
    # Create the background audio player
    background_audio = BackgroundAudioPlayer()

    # Start the background audio player
    await background_audio.start(room=ctx.room, agent_session=session)

    # Play the welcome audio
    play_handle = background_audio.play("/path/to/welcome-audio.mp3")

    # Wait for the audio to finish playing
    await play_handle

    # Now start the agent session
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(),
        turn_detection=turn_detector.MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Agent(instructions="You are a helpful voice AI assistant."),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    # Optionally, you can also play background audio during the session
    await background_audio.play("/path/to/background-audio.mp3", loop=True)

# Run the main function
asyncio.run(main(ctx))
Explanation:Playing the Audio: The
play
method is called with the path to your welcome audio file (e.g., an MP3 file). This method returns a
PlayHandle
, which you can await to ensure the audio finishes playing before proceeding. • Starting the Agent Session: After the welcome audio has finished playing, you can start the agent session as you normally would. Publishing an Audio Track Directly If you prefer to publish an audio track directly to the room instead of using the
BackgroundAudioPlayer
, you can use the LiveKit Python SDK to create an audio track and publish it. Here’s a brief example:
Copy code
from livekit import Room, AudioTrack

async def publish_audio_track(room: Room):
    # Create an audio track from an audio file
    audio_track = AudioTrack("/path/to/welcome-audio.mp3")

    # Publish the audio track to the room
    await room.publish_track(audio_track)

# Call this function before starting the agent session
await publish_audio_track(ctx.room)
Notes: • Ensure that the audio file paths are correct and accessible. • You can use either TTS or pre-recorded audio files (like MP3s) based on your preference. The above examples demonstrate how to use an MP3 file for the welcome message. This approach allows you to provide a seamless experience for users entering the room while ensuring that the agent session starts only after the welcome audio has been played.
t
Hi @refined-appointment-81829 I tested both your proposals but none of them seem to work 1.
BackgroundAudioPlayer
needs a
session
object to be able to run
Copy code
Exception: cannot access local participant before connecting {"pid": 3323, "job_id": "AJ_G8VBQnBxBteF"}
However, if I run this code after starting the session, it will be in competition with my call background audio and won't be played. The agent will start speaking instantly 2. For the audio track direct publish, I found the method
room.local_participant.publish_track
however I'm not sure to understand how I can transform a remote mp3 url (from s3 for instance) into an
AudioTrack
object 🤔 Semms like
AudioTrack
takes an
AudioSource
but none of them seem to have a param
audio_file_path
or
audio_file_url
r
you will need to transcode, frame and send the audio based on your needs.
t
Do you have any example to point me to? I've had a look at many examples but I can't find anything regarding the publishing of a mp3 track to the room
r
This is the most similar example I can think of. Would be adapted to your use case: https://github.com/livekit/python-sdks/blob/main/examples/play_audio_stream.py
t
Also I'm still stuck with
room.local_participant.publish_track
which triggers an error when executed before
agent_session.start()
Copy code
Exception: cannot access local participant before connecting {"pid": 5435, "job_id": "AJ_qXUY62tyzsXk"}
I managed to do something quite convoluted for such a simple task. It works if I execute
await publish_audio_track(ctx.room)
after session start, but then the audio is played at the same time as the bot speaking...
r
why are you having the bot speak in that case?
t
It's just that the session is started and the bot has a greeting instruction on enter
r
is that what you want?
t
My usecase is the following: my clients want to play a "pre-connect audio" in order to inform the caller the call is gonna be handled by an AI. This message is stylized with the client's audio music and their own voice. They provide a mp3 file. Once the message is over, the bot would start speaking, greeting the caller and starting the booking flow
I'm using Twilio as a telephony provider, but with SIP Trunking I'm not sure I can send TwiML data
r
So why not trigger the greeting after the audio is done?
t
Well, yes indeed, I'll see what I can do. But I'm quite surprised I need all this code (buffering etc) to publish an audio track to the room 🤔
👍 1
From what I see,
BackgroundAudioPlayer
is not able to play a remote file directly, right?
r
I don't think it plays remote file. You would pull it locally to spool or tmp and play there.
t
Alright. Well, doing this works
Copy code
await publish_audio_track(ctx.room)
    # Start background audio
    await userdata.audio_player.start(room=ctx.room, agent_session=session)

    await session.generate_reply()
But, if I speak while the audio is played, the agent answers 😅
I will have to mute then unmute the user...
Any plans on your side to add some pre-connect audio feature ?
r
I've not really heard anyone request it before. We can look into it.
Can you do your own custom turn detection in that case so agent does not respond. Or temporarily disable audio input during that phase of the call.
t
I'll see what I can do about it, but it seems quite some work for a simple feature. I might start way simpler and use TTS to generate the sentence.
r
Good luck.
🙏 1