I am getting this error while using groq stt with fall back LiveKit Community #ask-ai

I am getting this error while using groq stt with ...

billions-lawyer-8588

08/11/2025, 2:26 PM

I am getting this error while using groq stt with fall back adapter

Copy code

ValueError: STTs do not support streaming: livekit.plugins.groq.services.STT. Provide a VAD to enable stt.StreamAdapter automatically or wrap them with stt.StreamAdapter before using this adapter.

tall-airline-16756

08/11/2025, 2:28 PM

Thanks for reaching out, Harsh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/11/2025, 2:29 PM

Hey there! The error you're seeing happens because the STT service you're using doesn’t support streaming audio input right out of the box. No worries though — to fix this, you'll want to use a Voice Activity Detector (VAD) and wrap your STT with a

StreamAdapter

to get that streaming functionality working smoothly. Here’s a simple way to do it:

Copy code

from livekit import agents, rtc
from livekit.plugins import openai, silero
from livekit.agents.stt import StreamAdapter

async def process_track(ctx: agents.JobContext, track: rtc.Track):
    # Initialize the Groq STT
    groq_stt = openai.STT.with_groq()

    # Initialize the VAD
    vad = silero.VAD.load(
        min_silence_duration=0.5,
        min_speech_duration=0.1,
    )

    # Wrap the STT with StreamAdapter
    stt = StreamAdapter(
        stt=groq_stt,
        vad=vad.stream(),
    )

    # Use the wrapped STT for transcription
    stt_stream = stt.stream()
    audio_stream = rtc.AudioStream(track)

    async for audio_event in audio_stream:
        stt_stream.push_frame(audio_event.frame)

    stt_stream.end_input()

Just a couple of things to keep in mind: • VAD Initialization: This helps detect when someone is speaking in the audio stream. Using Silero VAD with settings like

min_silence_duration

and

min_speech_duration

works well. • StreamAdapter: It basically holds onto the audio until the VAD says speech has ended, then sends it off to the STT for processing. If you hit any bumps or want me to walk you through this, just let me know — I’m here to help!

billions-lawyer-8588

08/11/2025, 2:36 PM

This is my agent session code-

Copy code

session = AgentSession(
        userdata=userdata,
        turn_detection=EnglishModel(),
        user_away_timeout = 20.0,
        vad=ctx.proc.userdata["vad"],
        min_endpointing_delay = 0.75,
        max_endpointing_delay=3.0,
        # stt = deepgram.STT(model=config['stt']['model'],
        #                             interim_results = config['stt']['interim_results'],
        #                             smart_format = config['stt']['smart_format'],
        #                             punctuate = config['stt']['punctuate'],
        #                             filler_words = config['stt']['filler_words'],
        #                             profanity_filter = config['stt']['profanity_filter'],
        #                             language = config['stt']['language'],
        #                             numerals = config['stt']['numerals'],
        #                             api_key=config['DEEPGRAM_API_KEY']),

        stt = stt.FallbackAdapter(
            [
                groq.STT(
                    model=config['groq_stt']['model'],
                    language=config['groq_stt']['language'],
                    api_key=config['GROQ_API_KEY'],
                ),
                google.STT(
                    model=config['google_stt']['model'],
                    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
                    location=config['google_stt']['location']
                )
            ]
        ),
        llm = llm.FallbackAdapter(
            [
                google.LLM(
                    model=config['primary_llm']['model'],
                    temperature=config['primary_llm']['temperature'],
                    vertexai=config['primary_llm']['vertexai'],
                    location=config['primary_llm']['location'],
                ),
                google.LLM(
                    model=config['secondary_llm']['model'],
                    temperature=config['secondary_llm']['temperature'],
                    vertexai=config['secondary_llm']['vertexai'],
                    location=config['secondary_llm']['location'],
                )
            ]
        ),
        tts = tts.FallbackAdapter(
            [
                elevenlabs.TTS(
                    voice_id = config['tts']['voice_id'],
                    model = config['tts']['model'],
                    api_key = config['ELEVEN_API_KEY'],
                ),
                google.TTS(
                    gender=config['google_tts']['gender'],
                    voice_name=config['google_tts']['voice_name'],
                    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
                    location=config['google_tts']['location']
                )
            ]
        )
    )

I am providing vad

tall-airline-16756

08/11/2025, 2:37 PM

Thanks so much for sharing your code! It looks like the hiccup is because the

FallbackAdapter

needs the VAD to be passed explicitly when you’re using STTs that don’t support streaming out of the box. This way, the

FallbackAdapter

can wrap those non-streaming STTs with a

StreamAdapter

smoothly. Here’s a quick tweak you can make to your setup:

Copy code

from livekit.agents.stt import FallbackAdapter
from livekit.plugins import silero

# Load a VAD
vad = silero.VAD.load(
    min_speech_duration=0.1,
    min_silence_duration=0.5,
)

# Initialize STTs
stt1 = groq.STT(
    model=config['groq_stt']['model'],
    language=config['groq_stt']['language'],
    api_key=config['GROQ_API_KEY'],
)
stt2 = google.STT(
    model=config['google_stt']['model'],
    credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
    location=config['google_stt']['location']
)

# Create FallbackAdapter with VAD
fallback_stt = FallbackAdapter(
    stt=[stt1, stt2],
    vad=vad,
)

Just a couple of things to keep in mind: • Make sure to explicitly pass the VAD to the

FallbackAdapter

. • When you do, the

FallbackAdapter

will automatically wrap any non-streaming STTs with a

StreamAdapter

. If you run into any other questions or want to chat more about this, I’m here to help!

billions-lawyer-8588

08/11/2025, 2:50 PM

If I am providing vad in stt, do I need to provide vad parameter in Agent Session as well?

tall-airline-16756

08/11/2025, 2:51 PM

Hey there! Yes, you’ll want to provide the VAD in both the STT configuration and the

AgentSession

when you’re using the

FallbackAdapter

. Here’s a quick rundown of why that’s important: • STT Configuration: The VAD is needed in the

FallbackAdapter

to make streaming work smoothly for STTs that don’t support streaming on their own. It helps with buffering and processing audio chunks properly. • AgentSession: The VAD here helps with detecting turns and handling interruptions, which is a bit different from the transcription side of things. Here’s a simple example to show you how to set up both:

Copy code

fallback_stt = FallbackAdapter(
    stt=[
        groq.STT(
            model=config['groq_stt']['model'],
            language=config['groq_stt']['language'],
            api_key=config['GROQ_API_KEY'],
        ),
        google.STT(
            model=config['google_stt']['model'],
            credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
            location=config['google_stt']['location']
        )
    ],
    vad=Silero_VAD()  # Initialize your VAD instance
)

session = AgentSession(
    stt=fallback_stt,
    vad=Silero_VAD(),  # Provide the same or a separate VAD instance
    # Other configurations like llm and tts
)

Open in Slack

Previous Next