how to use coqui tts for TTS LiveKit Community #ask-ai

Join Slack

how to use coqui tts for TTS

# ask-ai

ripe-arm-36009

09/12/2025, 6:10 AM

how to use coqui tts for TTS

tall-airline-16756

09/12/2025, 6:10 AM

Thanks for reaching out, Av. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/12/2025, 6:11 AM

Hi there! I wanted to let you know that LiveKit doesn’t have built-in support for Coqui TTS for text-to-speech just yet. But don’t worry! You can definitely create a custom integration by extending the base TTS class in the LiveKit Agents framework. If you’d like, I’d be happy to guide you through the steps to set this up. Just let me know! Sources: Text-to-speech (TTS) integrations | LiveKit Docs | agents/src/tts/tts.ts

ripe-arm-36009

09/12/2025, 6:20 AM

yes i want to intigrate Coqui TTS

tall-airline-16756

09/12/2025, 6:21 AM

I'm glad you're interested in integrating Coqui TTS with LiveKit! To help you get started, here’s a friendly step-by-step guide to create a custom TTS plugin: 1. Define the Plugin Class: Begin by creating a subclass of the

Plugin

class and register it globally. This step lets LiveKit recognize your plugin smoothly. 2. Implement the TTS Class: Next, extend the

tts.TTS

class to set up your TTS engine. Be sure to include methods for synthesizing text and managing configuration details like API keys and voice settings. 3. Implement the ChunkedStream Class: This class will handle the synthesis process by communicating with the Coqui TTS API. Using an

AudioEmitter

helps you emit audio data in manageable chunks. 4. Handle Authentication: Make your plugin flexible by supporting API keys either through arguments or environment variables. 5. Ensure Error Handling: It’s important to wrap your API calls in try-except blocks and raise clear exceptions like

APITimeoutError

APIStatusError

to keep things robust. 6. Test and Use: Finally, once everything is set up, you can use your plugin within an

AgentSession

or as a standalone TTS engine. If you’d like, I’d be happy to walk you through detailed code examples or help with any particular step you find tricky. Just let me know! Sources: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/__ini... | plugins/resemble/src/index.ts | Working with plugins | LiveKit Docs | Text-to-speech (TTS) integrations | LiveKit Docs

ripe-arm-36009

09/12/2025, 6:24 AM

ok give me a code

tall-airline-16756

09/12/2025, 6:24 AM

I'm happy to guide you through creating a custom TTS plugin for Coqui TTS in LiveKit with a detailed example: Step 1: Define the Plugin Class

Copy code

from livekit.agents import Plugin

class CoquiTTSPlugin(Plugin):
    def __init__(self):
        super().__init__(__name__, "0.1.0", __package__)

Plugin.register_plugin(CoquiTTSPlugin())

Step 2: Implement the TTS Class

Copy code

from livekit.agents import tts
from typing import Final
import os

NUM_CHANNELS: Final[int] = 1
SAMPLE_RATE: Final[int] = 24000
MIME_TYPE = "audio/wav"

class CoquiTTS(tts.TTS):
    def __init__(self, *, voice: str = "default", language: str = "en", api_key: str | None = None):
        super().__init__(
            capabilities=tts.TTSCapabilities(streaming=False),
            sample_rate=SAMPLE_RATE,
            num_channels=NUM_CHANNELS,
        )

        if not api_key:
            api_key = os.environ.get("COQUI_TTS_API_KEY")
            if not api_key:
                raise ValueError("API key is required via argument or COQUI_TTS_API_KEY env var")

        self._voice = voice
        self._language = language
        self._api_key = api_key
        self._session = None

    def synthesize(self, text: str):
        return CoquiChunkedStream(tts=self, input_text=text)

Step 3: Implement the ChunkedStream Class

Copy code

import aiohttp
import asyncio
from livekit.agents import tts

class CoquiChunkedStream(tts.ChunkedStream):
    async def _run(self, output_emitter: tts.AudioEmitter):
        url = "<COQUI_API_URL>"
        headers = {
            "Authorization": f"Bearer {self._tts._api_key}",
            "Content-Type": "application/json",
        }
        payload = {
            "text": self._input_text,
            "voice": self._tts._voice,
            "language": self._tts._language,
        }

        try:
            async with aiohttp.ClientSession() as session:
                async with <http://session.post|session.post>(url, headers=headers, json=payload) as resp:
                    resp.raise_for_status()
                    output_emitter.initialize(
                        request_id="req_" + tts.utils.shortuuid(),
                        sample_rate=self._tts._sample_rate,
                        num_channels=NUM_CHANNELS,
                        mime_type=MIME_TYPE,
                    )

                    async for data, _ in resp.content.iter_chunks():
                        output_emitter.push(data)

                    output_emitter.flush()

        except asyncio.TimeoutError:
            raise tts.APITimeoutError()
        except aiohttp.ClientResponseError as e:
            raise tts.APIStatusError(message=e.message, status_code=e.status, request_id=None, body=None)
        except Exception as e:
            raise tts.APIConnectionError() from e

Step 4: Usage

Copy code

from livekit.agents import AgentSession
from coqui_tts_plugin import CoquiTTS

session = AgentSession(
    tts=CoquiTTS(api_key="your-coqui-api-key", voice="lisa")
)

# Use the session to synthesize text

Just remember to replace

<COQUI_API_URL>

with the actual API endpoint for Coqui TTS. If anything feels unclear or you’d like me to walk you through any part in more detail, please don’t hesitate to ask. I’m here to help!

2 Views

Open in Slack

Previous Next