The error "no audio frames were pushed for text" in the TTS inference task occurs when the system receives text input but fails to generate any audio output. Below are the steps to troubleshoot and resolve this issue:
1.
Check Retry Configuration:
Ensure the TTS system's retry logic is properly configured. For example:`conn_options = APIConnectOptions( max_retry=3, # Increase retry attempts timeout=30.0, # Adjust timeout as needed )`
2.
Monitor Error Events:
Use error event listeners to capture and log recoverable and non-recoverable errors:`def on_tts_error(error_event): logger.error(f"TTS Error: {error_event.error}, Recoverable: {error_event.recoverable}") tts.on("error", on_tts_error)`
3.
Validate Input Text:
Ensure the text being pushed to the TTS system is valid and not empty. The error message will specify the problematic text.
4.
Check TTS Service Availability:
Verify network connectivity to the TTS service, API credentials, and ensure there are no service-specific rate limits or quotas being exceeded.
5.
Implement Fallback Mechanisms:
Use fallback TTS adapters to switch between multiple TTS providers if one fails.
6.
Monitor Metrics:
Analyze metrics such as
ttfb
(time to first byte),
duration
, and
audio_duration
to identify performance issues.
The retry mechanism in the TTS system often resolves this error automatically. However, if the issue persists, it may indicate deeper problems with the TTS service, network, or input validation.
Sources:
livekit-agents/livekit/agents/voice/generation.py