< millions hair 24956> Whats the fastest STT and TSS for Voi LiveKit Community #ask-ai

Join Slack

<@U084APK40NS> Whats the fastest STT and TSS for V...

# ask-ai

creamy-exabyte-43475

02/25/2025, 11:59 PM

@millions-hair-24956 Whats the fastest STT and TSS for VoicePipelineAgent?

gray-monkey-83525

02/26/2025, 12:05 AM

It seems likely that the fastest STT and TTS options for the

VoicePipelineAgent

depend on the specific models and configurations you choose. According to LiveKit documentation, Deepgram is often used for STT due to its streaming capabilities, which can reduce latency. For TTS, Cartesia and ElevenLabs are mentioned as supporting streamed TTS, which can also help in reducing latency. Would you like more details on how to configure these models for optimal performance? Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

creamy-exabyte-43475

02/26/2025, 12:07 AM

@millions-hair-24956 Would you like more details on how to configure these models for optimal performance? that will be really helpfull.

gray-monkey-83525

02/26/2025, 12:07 AM

To optimize the performance of your

VoicePipelineAgent

using Deepgram for STT and Cartesia or ElevenLabs for TTS, you can follow these steps: 1. Deepgram STT Configuration: - Use the streaming method provided by Deepgram to ensure low latency. According to LiveKit documentation, Deepgram's STT supports streaming, which allows for real-time transcription. - Adjust the

endpointing_ms

parameter to a suitable value to avoid premature endpointing. A value between 1500ms to 2000ms is often recommended to handle natural pauses in speech. 2. Cartesia or ElevenLabs TTS Configuration: - Both Cartesia and ElevenLabs support streamed TTS, which can help in reducing the time taken to generate audio responses. Ensure that you are using the streaming capabilities of these TTS models. - For Cartesia, you might want to explore the

ChunkedStream

method, which allows for faster playback by processing audio in chunks. 3. General Configuration for VoicePipelineAgent: - Set

allow_interruptions

True

to enable the agent to handle interruptions, which can improve responsiveness. - Adjust

min_endpointing_delay

to a lower value if you want the agent to respond more quickly after detecting the end of speech. However, ensure it is not too low to avoid cutting off speech prematurely. Here's a basic example of how you might set up the `VoicePipelineAgent`:

Copy code

from livekit.agents import VoicePipelineAgent
from livekit.plugins import deepgram, cartesia

agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general", endpointing_ms=1500),
    llm=openai.LLM(model="gpt-3.5-turbo"),
    tts=cartesia.TTS(),
    allow_interruptions=True,
    interrupt_speech_duration=0.5,
    interrupt_min_words=0,
    min_endpointing_delay=0.5,
)

This setup uses Deepgram for STT with a 1500ms endpointing delay and Cartesia for TTS, allowing interruptions and setting a minimal endpointing delay to ensure quick responses. Sources: - LiveKit documentation - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

creamy-exabyte-43475

02/26/2025, 12:10 AM

@millions-hair-24956 how can i use ChunkedStream? give me an example please.

gray-monkey-83525

02/26/2025, 12:10 AM

It seems likely that to use

ChunkedStream

in LiveKit, you would typically work with a TTS provider that supports chunked HTTP responses. Here's a basic example using the Cartesia TTS plugin, which supports `ChunkedStream`: 1. Initialize the TTS Plugin: First, ensure you have the Cartesia TTS plugin installed and configured with your API key. 2. Create a TTS Instance: Use the Cartesia TTS class to create an instance. 3. Synthesize Text: Call the

synthesize

method with the text you want to convert to speech. This method returns a

ChunkedStream

. 4. Process the Stream: Iterate over the

ChunkedStream

to process audio frames as they are generated. Here's a Python example:

Copy code

import asyncio
from livekit.plugins.cartesia import TTS
from livekit.agents.tts import SynthesizedAudio

async def process_audio():
    # Initialize Cartesia TTS with necessary options
    tts = TTS(api_key="your_cartesia_api_key")

    # Synthesize text to get a ChunkedStream
    chunked_stream = tts.synthesize("Hello, this is a test of the ChunkedStream.")

    # Process the stream
    async for audio in chunked_stream:
        # Each 'audio' is a SynthesizedAudio object
        print(f"Received audio frame with request ID: {audio.request_id}")

# Run the async function
asyncio.run(process_audio())

This example demonstrates how to synthesize text using Cartesia's TTS and process the resulting

ChunkedStream

to handle audio frames as they are received. Would you like more details on any specific part of this process? Sources: - ChunkedStream documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

creamy-exabyte-43475

02/26/2025, 12:12 AM

@millions-hair-24956 do you support smallest.ai tts?

gray-monkey-83525

02/26/2025, 12:12 AM

I'm not able to find any data sources to answer your question. You might want to specify if you're looking for integration details or support for a specific feature of

smallest.ai

. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next