ripe-arm-36009
09/12/2025, 6:10 AMtall-airline-16756
09/12/2025, 6:10 AMtall-airline-16756
09/12/2025, 6:11 AMripe-arm-36009
09/12/2025, 6:20 AMtall-airline-16756
09/12/2025, 6:21 AMPlugin
class and register it globally. This step lets LiveKit recognize your plugin smoothly.
2. Implement the TTS Class: Next, extend the tts.TTS
class to set up your TTS engine. Be sure to include methods for synthesizing text and managing configuration details like API keys and voice settings.
3. Implement the ChunkedStream Class: This class will handle the synthesis process by communicating with the Coqui TTS API. Using an AudioEmitter
helps you emit audio data in manageable chunks.
4. Handle Authentication: Make your plugin flexible by supporting API keys either through arguments or environment variables.
5. Ensure Error Handling: It’s important to wrap your API calls in try-except blocks and raise clear exceptions like APITimeoutError
or APIStatusError
to keep things robust.
6. Test and Use: Finally, once everything is set up, you can use your plugin within an AgentSession
or as a standalone TTS engine.
If you’d like, I’d be happy to walk you through detailed code examples or help with any particular step you find tricky. Just let me know!
Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/__ini... | plugins/resemble/src/index.ts | Working with plugins | LiveKit Docs | Text-to-speech (TTS) integrations | LiveKit Docsripe-arm-36009
09/12/2025, 6:24 AMtall-airline-16756
09/12/2025, 6:24 AMfrom livekit.agents import Plugin
class CoquiTTSPlugin(Plugin):
def __init__(self):
super().__init__(__name__, "0.1.0", __package__)
Plugin.register_plugin(CoquiTTSPlugin())
Step 2: Implement the TTS Class
from livekit.agents import tts
from typing import Final
import os
NUM_CHANNELS: Final[int] = 1
SAMPLE_RATE: Final[int] = 24000
MIME_TYPE = "audio/wav"
class CoquiTTS(tts.TTS):
def __init__(self, *, voice: str = "default", language: str = "en", api_key: str | None = None):
super().__init__(
capabilities=tts.TTSCapabilities(streaming=False),
sample_rate=SAMPLE_RATE,
num_channels=NUM_CHANNELS,
)
if not api_key:
api_key = os.environ.get("COQUI_TTS_API_KEY")
if not api_key:
raise ValueError("API key is required via argument or COQUI_TTS_API_KEY env var")
self._voice = voice
self._language = language
self._api_key = api_key
self._session = None
def synthesize(self, text: str):
return CoquiChunkedStream(tts=self, input_text=text)
Step 3: Implement the ChunkedStream Class
import aiohttp
import asyncio
from livekit.agents import tts
class CoquiChunkedStream(tts.ChunkedStream):
async def _run(self, output_emitter: tts.AudioEmitter):
url = "<COQUI_API_URL>"
headers = {
"Authorization": f"Bearer {self._tts._api_key}",
"Content-Type": "application/json",
}
payload = {
"text": self._input_text,
"voice": self._tts._voice,
"language": self._tts._language,
}
try:
async with aiohttp.ClientSession() as session:
async with <http://session.post|session.post>(url, headers=headers, json=payload) as resp:
resp.raise_for_status()
output_emitter.initialize(
request_id="req_" + tts.utils.shortuuid(),
sample_rate=self._tts._sample_rate,
num_channels=NUM_CHANNELS,
mime_type=MIME_TYPE,
)
async for data, _ in resp.content.iter_chunks():
output_emitter.push(data)
output_emitter.flush()
except asyncio.TimeoutError:
raise tts.APITimeoutError()
except aiohttp.ClientResponseError as e:
raise tts.APIStatusError(message=e.message, status_code=e.status, request_id=None, body=None)
except Exception as e:
raise tts.APIConnectionError() from e
Step 4: Usage
from livekit.agents import AgentSession
from coqui_tts_plugin import CoquiTTS
session = AgentSession(
tts=CoquiTTS(api_key="your-coqui-api-key", voice="lisa")
)
# Use the session to synthesize text
Just remember to replace <COQUI_API_URL>
with the actual API endpoint for Coqui TTS. If anything feels unclear or you’d like me to walk you through any part in more detail, please don’t hesitate to ask. I’m here to help!