There isn't a comprehensive leaderboard or systematic latency comparison data for common TTS providers readily available. However, here are some insights and tools that can help:
•
Specific Latency Data:
◦ PlayAI TTS: 350ms
◦ Google TTS: Described as "fast" but with a "robotic-sounding" voice
◦ Tortoise TTS: Produces "amazing-sounding" audio but takes ~20 seconds to generate a single sentence
•
Testing Infrastructure: LiveKit supports a wide range of TTS providers for comparison, including AWS Polly, Azure TTS, Google, ElevenLabs, and more. This infrastructure allows for systematic testing and comparison.
•
Metrics Collection: LiveKit's framework includes tools to measure key latency indicators such as Time to First Byte (TTFB), total processing time, and audio duration. This enables detailed performance analysis.
•
Practical Comparison Tools: LiveKit provides examples of switching between providers (e.g., Rime, ElevenLabs, Cartesia, PlayAI) for direct comparison testing.
While a pre-built leaderboard isn't available, LiveKit's infrastructure and metrics collection capabilities can be used to generate detailed latency comparisons tailored to your needs.
Sources:
tests/test_stt.py |
README.md |
README.md |
Live Conversations with AI using ChatGPT and WebRTC