how to add thinking audio only when tool calling
# ask-ai
n
how to add thinking audio only when tool calling
t
Thanks for reaching out, Sebastian. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To add "thinking" audio that plays only during tool calls, you can use the
BackgroundAudioPlayer
with a
thinking_sound
configuration. Below is an example implementation:
Copy code
from livekit.agents import BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip

async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    session = AgentSession(
        # ... stt, llm, tts, vad, turn_detection, etc.
    )

    await session.start(
        room=ctx.room,
        # ... agent, etc.
    )
    background_audio = BackgroundAudioPlayer(
        thinking_sound=[
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
        ],
    )
    await background_audio.start(room=ctx.room, agent_session=session)
The
BackgroundAudioPlayer
will automatically play the configured thinking sounds when the agent's state changes to "thinking" during tool calls, and will stop the sounds when the state changes back. Sources: Handling RAG Delays in Voice Agents | LiveKit Docs | External data and RAG | LiveKit Docs
n
but only when tool calling
t
To ensure the thinking audio plays only during tool calls, you can use the
BackgroundAudioPlayer
in conjunction with the agent session. The agent session tracks the agent's state, including when it is "thinking" during tool calls. Below is an example implementation:
Copy code
from livekit.agents import BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip

async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    session = AgentSession(
        # ... stt, llm, tts, vad, turn_detection, etc.
    )

    await session.start(
        room=ctx.room,
        # ... agent, etc.
    )
    
    background_audio = BackgroundAudioPlayer(
        thinking_sound=[
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
        ],
    )
    await background_audio.start(room=ctx.room, agent_session=session)
The
BackgroundAudioPlayer
will automatically play the thinking sound when the agent's state changes to "thinking" during tool calls, and stop the sound when the state changes to something else. Sources: External data and RAG | LiveKit Docs | Tree View: | dse-workbench
n
yeah but it plays the audio even when not tool calling
t
To ensure the thinking audio plays only during tool calls, you need to configure the
BackgroundAudioPlayer
to specifically track the agent's state during tool calls. The
thinking_sound
should only activate when the agent's state is set to "thinking" during these calls. Below is the implementation:
Copy code
from livekit.agents import BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip

async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    session = AgentSession(
        # ... stt, llm, tts, vad, turn_detection, etc.
    )

    await session.start(
        room=ctx.room,
        # ... agent, etc.
    )
    background_audio = BackgroundAudioPlayer(
        thinking_sound=[
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
            AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
        ],
    )
    await background_audio.start(room=ctx.room, agent_session=session)
This setup ensures that the thinking audio is tied to the "thinking" state, which occurs during tool calls. If the audio is still playing outside of tool calls, please verify the agent's state transitions and ensure they are correctly set to "thinking" only during tool calls. Sources: External data and RAG | LiveKit Docs | Handling RAG Delays in Voice Agents | LiveKit Docs