LiveKit supports integration with a variety of providers for Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Models (LLM) through its flexible plugin system:
•
Speech-to-Text (STT): Options include Deepgram, Whisper, and others. Deepgram is known for high accuracy and low latency, suitable for real-time multilingual transcription, diarization, and translation.
•
Text-to-Speech (TTS): Providers like Cartesia and ElevenLabs offer high-quality, natural-sounding speech synthesis with a range of voices and languages.
•
Large Language Models (LLM): LiveKit integrates with LLMs from OpenAI, Anthropic, and similar providers for advanced natural language understanding and generation, useful for chat, content generation, and more.
The plugin system allows modular and easy switching or combining of these AI services to tailor voice AI applications based on your specific requirements like accuracy, latency, language support, and cost.
This setup makes LiveKit a versatile platform for building sophisticated voice-enabled solutions.
Sources:
Voice AI quickstart | LiveKit Docs |
Speech-to-text (STT) integrations | LiveKit Docs |
LiveKit Agents integrations | LiveKit Docs |
livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py |
Plugin System | livekit/livekit_composite | DeepWiki |
Working with plugins | LiveKit Docs |
LiveKit integration guides | LiveKit Docs |
plugins/test/src/index.ts