how does the background player works under the hood i see th LiveKit Community #ask-ai

how does the background player works under the hoo...

aloof-scooter-47164

09/10/2025, 1:44 AM

how does the background player works under the hood? i see there's an audio mixer class it uses. i want to know because when using the background audio player, the audio in my local python sounds pitched down but from the Room recording, is playing correctly so something changes in how i have to process the received audio frames

tall-airline-16756

09/10/2025, 1:44 AM

Thanks for reaching out, Sebastián. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/10/2025, 1:45 AM

The

BackgroundAudioPlayer

in LiveKit is thoughtfully designed to manage background audio, like ambient sounds or agent-specific effects, during a session. It uses the

AudioMixer

class to blend multiple audio streams (such as ambient, thinking, and on-demand sounds) into one seamless output stream. This mixed audio is then sent to an

rtc.AudioSource

for publication to the LiveKit room as a separate audio track. The

AudioMixer

takes care of mixing these streams in real-time, making sure they stay synchronized and output in fixed-size audio chunks. You can add or remove each stream dynamically, and control audio levels individually. The player supports built-in audio clips, custom file paths, and async iterators that yield

rtc.AudioFrame

objects. About the audio pitch differences you're experiencing, this might be related to how the audio frames are processed or the sample rate configuration. If you could share a bit more about how you're handling the received audio frames in your Python setup, that would be really helpful. Together, we can pinpoint if any adjustments are needed to better align with the `BackgroundAudioPlayer`'s output. Sources: Agent speech and audio | LiveKit Docs | examples/voice_agents/background_audio.py | livekit-rtc/livekit/rtc/audio_mixer.py | Speech & Audio Processing | livekit/livekit_composite | DeepWiki

aloof-scooter-47164

09/10/2025, 1:52 AM

Copy code

@self.room.on("track_subscribed")
        def on_track_subscribed(
            track: rtc.Track,
            publication: rtc.RemoteTrackPublication,
            participant: rtc.RemoteParticipant,
        ):
            <http://logger.info|logger.info>(
                f"Track subscribed from {participant.identity}: {publication.sid}"
            )

            # Handle audio track subscription
            if track.kind == rtc.TrackKind.KIND_AUDIO:
                if (
                    self.agent_participant
                    and participant.identity == self.agent_participant.identity
                ):
                    asyncio.create_task(
                        self._handle_remote_audio_track(
                            cast(rtc.AudioTrack, track), participant
                        ),
                    )
                else:
                    <http://logger.info|logger.info>(
                        f"Not handling audio track from {participant.identity}: "
                        f"{publication.sid}"
                    )

    async def _handle_remote_audio_track(
        self, track: rtc.AudioTrack, participant: rtc.RemoteParticipant
    ):
        """Handle incoming audio track from remote participant."""
        <http://logger.info|logger.info>(f"Setting up audio stream from {participant.identity}")

        try:
            # Create audio stream
            self.remote_audio_stream = rtc.AudioStream(
                track, sample_rate=self.sample_rate, num_channels=self.num_channels
            )

            # Forward to audio handler if callback is available
            if self.get_audio_handler_callback:
                audio_handler = self.get_audio_handler_callback()
                if audio_handler:
                    await audio_handler.receive_audio_from_livekit(
                        self.remote_audio_stream
                    )

        except Exception as e:
            logger.error(f"Error handling remote audio track: {e}")

///

    async def receive_audio_from_livekit(self, audio_stream: "rtc.AudioStream"):
        """
        Receive audio frames from LiveKit remote participant and push to playback.

        Args:
            audio_stream: LiveKit audio stream from remote participant
        """
        if not rtc:
            logger.warning("LiveKit not available for receiving audio")
            return

        self.is_receiving = True
        frames_received = 0

        try:
            <http://logger.info|logger.info>("Started receiving audio from LiveKit")

            async for frame_event in audio_stream:
                if not self.is_receiving:
                    break

                frames_received += 1

                # Extract audio data from frame
                audio_data = np.frombuffer(frame_event.frame.data, dtype=np.int16)

                # Push audio directly to playback if callback is set
                if self.audio_output_callback:
                    try:
                        written = self.audio_output_callback(audio_data)
                        if written < len(audio_data):
                            logger.warning(
                                f"Playback buffer overflow: "
                                f"{len(audio_data) - written} samples dropped"
                            )
                    except Exception as e:
                        logger.error(f"Error pushing audio to playback: {e}")

                # Apply reverse stream processing for echo cancellation
                # For reverse stream, we need to process it in chunks too
                if self.apm and len(audio_data) == self.apm_frame_size:
                    try:
                        render_frame = rtc.AudioFrame(
                            data=audio_data.tobytes(),
                            sample_rate=self.sample_rate,
                            num_channels=self.num_channels,
                            samples_per_channel=len(audio_data),
                        )
                        self.apm.process_reverse_stream(render_frame)
                    except Exception as e:
                        if frames_received <= 10:
                            logger.warning(f"Error processing reverse stream: {e}")

                # Log progress
                if frames_received <= 5:
                    logger.debug(f"Received frame {frames_received} from LiveKit")
                elif frames_received % 100 == 0:
                    logger.debug(f"Received {frames_received} frames from LiveKit")

        except Exception as e:
            logger.error(f"Error receiving audio from LiveKit: {e}")
        finally:
            self.is_receiving = False
            <http://logger.info|logger.info>(f"Stopped receiving audio. Total frames: {frames_received}")

tall-airline-16756

09/10/2025, 1:52 AM

Thank you so much for sharing your code with me. From what I see, the audio pitch issue might be happening because of mismatched sample rates or how the audio frames are handled during playback or processing. The

BackgroundAudioPlayer

uses the

AudioMixer

, which outputs audio frames with specific settings like sample rate and number of channels. To help you get the playback just right and avoid any pitch issues, here are a few things to check: • Make sure that the

sample_rate

and

num_channels

in your

rtc.AudioStream

match what the

BackgroundAudioPlayer

is using. • Take a look at whether the audio frames are being resampled or changed before playback. Sometimes, if the sample rate doesn’t match during playback, it can cause the pitch to shift. • Double-check that your

audio_output_callback

is processing the audio data without changing its timing or sample rate. If you can share the sample rate and channel setup that your

BackgroundAudioPlayer

uses, I’d be happy to help you adjust your processing to fit perfectly. Please feel free to reach out if you want to go over this together or if there’s anything else I can assist with!

aloof-scooter-47164

09/10/2025, 1:53 AM

oh forgot, this is the audio configuration:

Copy code

# Audio configuration
        self.sample_rate = 48000
        self.num_channels = 1
        self.apm_frame_size = 480  # APM expects exactly 10ms at 48kHz

tall-airline-16756

09/10/2025, 1:53 AM

Thank you so much for sharing your audio configuration! It’s great to see that the

BackgroundAudioPlayer

in LiveKit uses the same default sample rate of 48000 Hz and 1 channel, just like your setup. This match usually helps avoid pitch problems that come from mismatched sample rates or channel counts. If you’re still noticing pitch issues, it might be worth taking a closer look at how the audio frames are being handled or processed. Here are a couple of things to double-check: • Make sure the

audio_output_callback

plays the audio frames without changing their timing or sample rate. • Verify that the

apm.process_reverse_stream

is set up correctly and isn’t unintentionally modifying the audio frames. If the problem continues, please don’t hesitate to reach out. I’m here to help, and we can definitely dig deeper together to find a solution. Sources: livekit-plugins/livekit-plugins-playai/livekit/plugins/playai/tts.py

aloof-scooter-47164

09/10/2025, 1:55 AM

•

audio_output_callback

without the BackgroundAudioPlayer, plays agent audio correctly pitched •

apm.process_reverse_stream

is the one from Livekit RTC for AudioProcessingModule

tall-airline-16756

09/10/2025, 1:56 AM

Thank you so much for confirming the details! Based on what you've shared and your setup, here are some possible causes and friendly troubleshooting steps to help with the pitch issue: • Sample Rate Mismatch: It’s a good idea to double-check that all your audio files or streams used with the

BackgroundAudioPlayer

are pre-converted to 48kHz, mono format. Sometimes, mismatched sample rates can sneakily cause pitch distortion. • Custom Audio Processing: If you’re using any custom audio processing like echo cancellation or noise suppression, take a moment to verify that it’s not unintentionally changing the pitch. For example, check if the

AudioProcessingModule

might be applying any unexpected changes. • Buffer Management: Keep an eye out for buffer underruns or overflows in the

BackgroundAudioPlayer

. Sometimes, increasing the

queue_size_ms

in the

AudioSource

configuration can help keep the frame delivery smooth and stable. • Volume Scaling: The

BackgroundAudioPlayer

does apply volume scaling, which could potentially introduce artifacts if not managed carefully. It’s worth ensuring this isn’t impacting the audio output. If none of these suggestions seem to fix the issue, could you please let me know if you’re using any custom audio processing delegates or third-party pitch modification libraries in your setup? That info would really help us get to the bottom of this together. Sources: Speech & Audio Processing | livekit/livekit_composite | DeepWiki | Sarvam TTS integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py | realtime/openai-realtime-pitch-shift.py | Sources/LiveKit/Audio/Manager/AudioManager.swift

aloof-scooter-47164

09/10/2025, 2:00 AM

how to check the SampleRateMismatch?

tall-airline-16756

09/10/2025, 2:00 AM

To help you check for sample rate mismatches in LiveKit audio streams, here are a few friendly tips: • Take a look at the

sample_rate

property of your

AudioFrame

objects and see if it matches the expected rate, like 48000 Hz for the

BackgroundAudioPlayer

. • Try using LiveKit’s handy built-in functions such as

combineAudioFrames

mergeFrames

. These will actually let you know by throwing errors if there’s a mismatch. • Keep an eye on your logs or any exceptions in your processing pipeline—LiveKit is pretty good at validating sample rate consistency and will alert you if something’s off. If you do spot any mismatches, a great way forward is to use an audio resampler to convert the audio to the right sample rate before playback or further processing. And please, don’t hesitate to reach out if you’d like some help with implementing these checks or setting up resampling—I’m here to support you! Sources: livekit-rtc/livekit/rtc/utils.py | agents/src/utils.test.ts | tests/test_vad.py | packages/livekit-rtc/src/audio_frame.ts

aloof-scooter-47164

09/10/2025, 2:05 AM

what is the combineAudioFrames?

aloof-scooter-47164

09/10/2025, 2:05 AM

Received frame with length: 480, sample rate: 48000, num_channels: 1, samples_per_channel: 480

tall-airline-16756

09/10/2025, 2:05 AM

I'm happy to help you understand the

combineAudioFrames

function in LiveKit! This handy utility merges multiple

rtc.AudioFrame

objects into one by concatenating their audio data. It makes sure all input frames share the same

sampleRate

and

numChannels

, and will let you know with an error if something doesn’t match up. People often use this for putting together TTS output, short audio captures, or even debugging their workflows. When working in Python, you might see this function referred to as

merge_frames

combine_frames

. It’s designed to efficiently allocate memory for the combined frame and copy data from each input frame. The final frame keeps the same

sampleRate

and

numChannels

, with

samplesPerChannel

being the total of all input frames combined. If you’re dealing with audio frames and want to combine them smoothly, this function is a great tool to keep things consistent and simplify your process. Feel free to reach out if you’d like me to share an example or dive into more details—I’m here to help! Sources: packages/livekit-rtc/src/audio_frame.ts | livekit-rtc/livekit/rtc/audio_mixer.py | plugins/openai/src/realtime/realtime_model.ts | livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/vad.py | Capturing the First Few Seconds of a Call for Processing

aloof-scooter-47164

09/10/2025, 2:35 AM

i'm testing different things and i can't seem to be hitting the issue

tall-airline-16756

09/10/2025, 2:35 AM

I'm really glad to hear that the issue hasn’t come up again during your tests. If the pitch problem does pop up once more, please don’t hesitate to share any new details or settings with me—I’m here to help you every step of the way. And of course, if there’s anything else you’d like to discuss or need help with, just let me know!

aloof-scooter-47164

09/10/2025, 2:35 AM

no i meant, i can't finde the root of the issue

tall-airline-16756

09/10/2025, 2:35 AM

I completely understand how tricky this pitch issue can be when the root cause isn’t immediately clear. Let’s try narrowing it down together with a few focused steps: • Log Audio Frame Details: It might help to add detailed logging for each

AudioFrame

you receive—things like

sample_rate

num_channels

, and

samples_per_channel

. This way, we can confirm if everything stays consistent across frames. • Test Without APM: Temporarily turning off the

AudioProcessingModule

could reveal if it’s causing any unexpected changes. • Isolate BackgroundAudioPlayer: Try testing the

BackgroundAudioPlayer

on its own by playing a known audio file and checking the pitch to see if it sounds right. • Resample Input Audio: If you think sample rates might be mismatched, explicitly resampling all input audio to 48kHz before processing or playback can often help. • Check Volume Scaling: Also, double-check that volume adjustments within the

BackgroundAudioPlayer

aren’t introducing any artifacts. If you’d like, I’m here to walk you through any of these steps or help you analyze specific parts of your setup. Just let me know how you want to move forward—I’m happy to assist!

aloof-scooter-47164

09/10/2025, 3:04 AM

definitely there's something in the BackgroundAudioPlayer that is changing something on the way LK sends audio stream because if i disable it on the room, all sounds correctly

tall-airline-16756

09/10/2025, 3:04 AM

Thanks so much for sharing your observations! The

BackgroundAudioPlayer

does adjust audio streams in a few ways, which might explain the changes you're noticing: • Audio Mixing: It uses

rtc.AudioMixer

to blend ambient, thinking, and on-demand audio streams into one output, which is then sent to the room. • Volume Adjustments: The player applies volume scaling to audio frames, so the audio signal might be a bit different before it’s sent out. • Buffering: The

rtc.AudioSource

includes a 400ms buffer, which can sometimes cause slight delays or timing shifts. • Agent State Synchronization: It can play sounds like typing noises based on the agent’s state, which dynamically changes the audio stream. All these features aim to improve your audio experience, though they might sometimes affect the stream in unexpected ways. I’d be happy to help you test specific settings or dig deeper to pinpoint the issue. Just let me know how you’d like to move forward—I’m here to support you! Sources: examples/drive-thru/drivethru_agent.py | Agent speech and audio | LiveKit Docs | Speech & Audio Processing | livekit/livekit_composite | DeepWiki

aloof-scooter-47164

09/10/2025, 3:10 AM

which of this could be the potential issue?

tall-airline-16756

09/10/2025, 3:10 AM

Thank you for sharing your observations! It sounds like the pitch issue might be coming from a few possible areas: • Audio Mixing: The

rtc.AudioMixer

brings together multiple audio streams, and if there’s any mismatch in sample rates or timing between them, it could impact the final sound. • Volume Adjustments: The

BackgroundAudioPlayer

adjusts volume on audio frames, and sometimes that can introduce subtle changes if not perfectly handled. • Buffering: The 400ms buffer in

rtc.AudioSource

might cause timing shifts, especially if there are any delays during processing or frame delivery. To help pinpoint what’s going on, I suggest trying these steps: 1. Turn off the ambient and thinking sounds in the

BackgroundAudioPlayer

to see if the pitch issue still happens. 2. Log the audio frames sent by the

BackgroundAudioPlayer

to check for any differences in sample rate, channels, or timing. 3. Try a simpler setup, like using just one audio file, to see if the problem is related to mixing or playback. If you’d like, I’m happy to guide you through any of these tests or help with anything else. Just let me know!

aloof-scooter-47164

09/10/2025, 8:34 PM

hey, is it possible that thinking phrases from the agent are considered as thinking state rather than speaking?

tall-airline-16756

09/10/2025, 8:35 PM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

aloof-scooter-47164

09/10/2025, 8:50 PM

can the background player sounds be a WAV file? is there a suggested approach to local files? if the sound is 3s what's the playout timeout suggested?

Open in Slack

Previous Next