able-jackal-84140
07/13/2025, 6:48 PMtall-airline-16756
07/13/2025, 6:48 PMtall-airline-16756
07/13/2025, 6:49 PMlivekit.agents - _SegmentSynchronizerImpl.playback_finished called before text/audio input is done
indicates that the playback system is attempting to mark a segment as finished before either the text or audio input has been properly completed. This is typically a timing or sequencing issue in your input/output workflow.
• Ensure proper input completion: Before calling the method that marks playback as finished, explicitly end both text and audio inputs by invoking the appropriate methods (such as end_text_input()
and end_audio_input()
) on the synchronizer implementation.
• Check your AudioOutput flush()
implementation: If using a custom AudioOutput, ensure that the flush()
method ends the audio input by calling self._synchronizer._impl.end_audio_input()
after audio data is processed.
• Correct event scheduling: Make sure that any callbacks (such as on_playback_finished()
) are only fired after both text and audio inputs are done, and that proper async barriers are used to avoid race conditions.
• Debug and log state: Consider adding debug logs to track _text_data.done
and _audio_data.done
states to identify if either is not being properly marked as done when playback is finished.
Review these areas in your integration to ensure the correct sequencing. For more detailed examples of implementation, double-check your custom output classes against reference implementations like FakeAudioOutput
and always ensure flush()
and playback callbacks are executed in the expected order. This should resolve the synchronization error.
Sources: livekit-agents/livekit/agents/voice/agent_activity.pyable-jackal-84140
07/13/2025, 6:50 PMtall-airline-16756
07/13/2025, 6:51 PMself._synchronizer._impl.end_audio_input()
method is like a little signal that says, "Hey, we're done with this audio segment!" It does a few key things: it marks the current audio segment as finished, stops the speaking rate stream, and helps recalculate the speaking rate based on the text and audio duration.
• When to call: You'll want to use this in your AudioOutput's flush()
method, basically when you know no more audio frames are coming for the current segment. It's like telling your system, "Okay, we're all done here!"
• Lifecycle: Think of this as closing the book on the audio part of a segment. It's super important for making sure the timing between text and audio playback is spot on. This helps make sure things like speaking rate calculations are accurate and smooth.
• Prevents errors: If you forget to call end_audio_input()
, things can get a bit messy. Your system might think more audio is coming, which can cause timing hiccups or even those frustrating playback errors you might have seen before.
Bottom line: end_audio_input()
is your friendly little helper that lets the system know, "We're done with this audio segment, time to move forward!" It keeps everything running smoothly and reliably.
Sources: livekit-agents/livekit/agents/voice/audio_recognition.py