I am getting this error while using groq stt with ...
# ask-ai
b
I am getting this error while using groq stt with fall back adapter
Copy code
ValueError: STTs do not support streaming: livekit.plugins.groq.services.STT. Provide a VAD to enable stt.StreamAdapter automatically or wrap them with stt.StreamAdapter before using this adapter.
t
Thanks for reaching out, Harsh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! The error you're seeing happens because the STT service you're using doesn’t support streaming audio input right out of the box. No worries though — to fix this, you'll want to use a Voice Activity Detector (VAD) and wrap your STT with a
StreamAdapter
to get that streaming functionality working smoothly. Here’s a simple way to do it:
Copy code
from livekit import agents, rtc
from livekit.plugins import openai, silero
from livekit.agents.stt import StreamAdapter

async def process_track(ctx: agents.JobContext, track: rtc.Track):
    # Initialize the Groq STT
    groq_stt = openai.STT.with_groq()

    # Initialize the VAD
    vad = silero.VAD.load(
        min_silence_duration=0.5,
        min_speech_duration=0.1,
    )

    # Wrap the STT with StreamAdapter
    stt = StreamAdapter(
        stt=groq_stt,
        vad=vad.stream(),
    )

    # Use the wrapped STT for transcription
    stt_stream = stt.stream()
    audio_stream = rtc.AudioStream(track)

    async for audio_event in audio_stream:
        stt_stream.push_frame(audio_event.frame)

    stt_stream.end_input()
Just a couple of things to keep in mind: • VAD Initialization: This helps detect when someone is speaking in the audio stream. Using Silero VAD with settings like
min_silence_duration
and
min_speech_duration
works well. • StreamAdapter: It basically holds onto the audio until the VAD says speech has ended, then sends it off to the STT for processing. If you hit any bumps or want me to walk you through this, just let me know — I’m here to help!
b
This is my agent session code-
Copy code
session = AgentSession(
        userdata=userdata,
        turn_detection=EnglishModel(),
        user_away_timeout = 20.0,
        vad=ctx.proc.userdata["vad"],
        min_endpointing_delay = 0.75,
        max_endpointing_delay=3.0,
        # stt = deepgram.STT(model=config['stt']['model'],
        #                             interim_results = config['stt']['interim_results'],
        #                             smart_format = config['stt']['smart_format'],
        #                             punctuate = config['stt']['punctuate'],
        #                             filler_words = config['stt']['filler_words'],
        #                             profanity_filter = config['stt']['profanity_filter'],
        #                             language = config['stt']['language'],
        #                             numerals = config['stt']['numerals'],
        #                             api_key=config['DEEPGRAM_API_KEY']),

        stt = stt.FallbackAdapter(
            [
                groq.STT(
                    model=config['groq_stt']['model'],
                    language=config['groq_stt']['language'],
                    api_key=config['GROQ_API_KEY'],
                ),
                google.STT(
                    model=config['google_stt']['model'],
                    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
                    location=config['google_stt']['location']
                )
            ]
        ),
        llm = llm.FallbackAdapter(
            [
                google.LLM(
                    model=config['primary_llm']['model'],
                    temperature=config['primary_llm']['temperature'],
                    vertexai=config['primary_llm']['vertexai'],
                    location=config['primary_llm']['location'],
                ),
                google.LLM(
                    model=config['secondary_llm']['model'],
                    temperature=config['secondary_llm']['temperature'],
                    vertexai=config['secondary_llm']['vertexai'],
                    location=config['secondary_llm']['location'],
                )
            ]
        ),
        tts = tts.FallbackAdapter(
            [
                elevenlabs.TTS(
                    voice_id = config['tts']['voice_id'],
                    model = config['tts']['model'],
                    api_key = config['ELEVEN_API_KEY'],
                ),
                google.TTS(
                    gender=config['google_tts']['gender'],
                    voice_name=config['google_tts']['voice_name'],
                    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
                    location=config['google_tts']['location']
                )
            ]
        )
    )
I am providing vad
t
Thanks so much for sharing your code! It looks like the hiccup is because the
FallbackAdapter
needs the VAD to be passed explicitly when you’re using STTs that don’t support streaming out of the box. This way, the
FallbackAdapter
can wrap those non-streaming STTs with a
StreamAdapter
smoothly. Here’s a quick tweak you can make to your setup:
Copy code
from livekit.agents.stt import FallbackAdapter
from livekit.plugins import silero

# Load a VAD
vad = silero.VAD.load(
    min_speech_duration=0.1,
    min_silence_duration=0.5,
)

# Initialize STTs
stt1 = groq.STT(
    model=config['groq_stt']['model'],
    language=config['groq_stt']['language'],
    api_key=config['GROQ_API_KEY'],
)
stt2 = google.STT(
    model=config['google_stt']['model'],
    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
    location=config['google_stt']['location']
)

# Create FallbackAdapter with VAD
fallback_stt = FallbackAdapter(
    stt=[stt1, stt2],
    vad=vad,
)
Just a couple of things to keep in mind: • Make sure to explicitly pass the VAD to the
FallbackAdapter
. • When you do, the
FallbackAdapter
will automatically wrap any non-streaming STTs with a
StreamAdapter
. If you run into any other questions or want to chat more about this, I’m here to help!
b
If I am providing vad in stt, do I need to provide vad parameter in Agent Session as well?
t
Hey there! Yes, you’ll want to provide the VAD in both the STT configuration and the
AgentSession
when you’re using the
FallbackAdapter
. Here’s a quick rundown of why that’s important: • STT Configuration: The VAD is needed in the
FallbackAdapter
to make streaming work smoothly for STTs that don’t support streaming on their own. It helps with buffering and processing audio chunks properly. • AgentSession: The VAD here helps with detecting turns and handling interruptions, which is a bit different from the transcription side of things. Here’s a simple example to show you how to set up both:
Copy code
fallback_stt = FallbackAdapter(
    stt=[
        groq.STT(
            model=config['groq_stt']['model'],
            language=config['groq_stt']['language'],
            api_key=config['GROQ_API_KEY'],
        ),
        google.STT(
            model=config['google_stt']['model'],
            credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
            location=config['google_stt']['location']
        )
    ],
    vad=Silero_VAD()  # Initialize your VAD instance
)

session = AgentSession(
    stt=fallback_stt,
    vad=Silero_VAD(),  # Provide the same or a separate VAD instance
    # Other configurations like llm and tts
)
If anything’s unclear or you want to dive deeper, just let me know—I’m here to help! Sources: tests/test_stt_fallback.py | Turn detection and interruptions | LiveKit Docs | Agents Core Architecture | livekit/livekit_composite | DeepWiki