billions-lawyer-8588
08/11/2025, 2:26 PMValueError: STTs do not support streaming: livekit.plugins.groq.services.STT. Provide a VAD to enable stt.StreamAdapter automatically or wrap them with stt.StreamAdapter before using this adapter.
tall-airline-16756
08/11/2025, 2:28 PMtall-airline-16756
08/11/2025, 2:29 PMStreamAdapter
to get that streaming functionality working smoothly.
Here’s a simple way to do it:
from livekit import agents, rtc
from livekit.plugins import openai, silero
from livekit.agents.stt import StreamAdapter
async def process_track(ctx: agents.JobContext, track: rtc.Track):
# Initialize the Groq STT
groq_stt = openai.STT.with_groq()
# Initialize the VAD
vad = silero.VAD.load(
min_silence_duration=0.5,
min_speech_duration=0.1,
)
# Wrap the STT with StreamAdapter
stt = StreamAdapter(
stt=groq_stt,
vad=vad.stream(),
)
# Use the wrapped STT for transcription
stt_stream = stt.stream()
audio_stream = rtc.AudioStream(track)
async for audio_event in audio_stream:
stt_stream.push_frame(audio_event.frame)
stt_stream.end_input()
Just a couple of things to keep in mind:
• VAD Initialization: This helps detect when someone is speaking in the audio stream. Using Silero VAD with settings like min_silence_duration
and min_speech_duration
works well.
• StreamAdapter: It basically holds onto the audio until the VAD says speech has ended, then sends it off to the STT for processing.
If you hit any bumps or want me to walk you through this, just let me know — I’m here to help!billions-lawyer-8588
08/11/2025, 2:36 PMsession = AgentSession(
userdata=userdata,
turn_detection=EnglishModel(),
user_away_timeout = 20.0,
vad=ctx.proc.userdata["vad"],
min_endpointing_delay = 0.75,
max_endpointing_delay=3.0,
# stt = deepgram.STT(model=config['stt']['model'],
# interim_results = config['stt']['interim_results'],
# smart_format = config['stt']['smart_format'],
# punctuate = config['stt']['punctuate'],
# filler_words = config['stt']['filler_words'],
# profanity_filter = config['stt']['profanity_filter'],
# language = config['stt']['language'],
# numerals = config['stt']['numerals'],
# api_key=config['DEEPGRAM_API_KEY']),
stt = stt.FallbackAdapter(
[
groq.STT(
model=config['groq_stt']['model'],
language=config['groq_stt']['language'],
api_key=config['GROQ_API_KEY'],
),
google.STT(
model=config['google_stt']['model'],
credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
location=config['google_stt']['location']
)
]
),
llm = llm.FallbackAdapter(
[
google.LLM(
model=config['primary_llm']['model'],
temperature=config['primary_llm']['temperature'],
vertexai=config['primary_llm']['vertexai'],
location=config['primary_llm']['location'],
),
google.LLM(
model=config['secondary_llm']['model'],
temperature=config['secondary_llm']['temperature'],
vertexai=config['secondary_llm']['vertexai'],
location=config['secondary_llm']['location'],
)
]
),
tts = tts.FallbackAdapter(
[
elevenlabs.TTS(
voice_id = config['tts']['voice_id'],
model = config['tts']['model'],
api_key = config['ELEVEN_API_KEY'],
),
google.TTS(
gender=config['google_tts']['gender'],
voice_name=config['google_tts']['voice_name'],
credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
location=config['google_tts']['location']
)
]
)
)
I am providing vadtall-airline-16756
08/11/2025, 2:37 PMFallbackAdapter
needs the VAD to be passed explicitly when you’re using STTs that don’t support streaming out of the box. This way, the FallbackAdapter
can wrap those non-streaming STTs with a StreamAdapter
smoothly.
Here’s a quick tweak you can make to your setup:
from livekit.agents.stt import FallbackAdapter
from livekit.plugins import silero
# Load a VAD
vad = silero.VAD.load(
min_speech_duration=0.1,
min_silence_duration=0.5,
)
# Initialize STTs
stt1 = groq.STT(
model=config['groq_stt']['model'],
language=config['groq_stt']['language'],
api_key=config['GROQ_API_KEY'],
)
stt2 = google.STT(
model=config['google_stt']['model'],
credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
location=config['google_stt']['location']
)
# Create FallbackAdapter with VAD
fallback_stt = FallbackAdapter(
stt=[stt1, stt2],
vad=vad,
)
Just a couple of things to keep in mind:
• Make sure to explicitly pass the VAD to the FallbackAdapter
.
• When you do, the FallbackAdapter
will automatically wrap any non-streaming STTs with a StreamAdapter
.
If you run into any other questions or want to chat more about this, I’m here to help!billions-lawyer-8588
08/11/2025, 2:50 PMtall-airline-16756
08/11/2025, 2:51 PMAgentSession
when you’re using the FallbackAdapter
. Here’s a quick rundown of why that’s important:
• STT Configuration: The VAD is needed in the FallbackAdapter
to make streaming work smoothly for STTs that don’t support streaming on their own. It helps with buffering and processing audio chunks properly.
• AgentSession: The VAD here helps with detecting turns and handling interruptions, which is a bit different from the transcription side of things.
Here’s a simple example to show you how to set up both:
fallback_stt = FallbackAdapter(
stt=[
groq.STT(
model=config['groq_stt']['model'],
language=config['groq_stt']['language'],
api_key=config['GROQ_API_KEY'],
),
google.STT(
model=config['google_stt']['model'],
credentials_file=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
location=config['google_stt']['location']
)
],
vad=Silero_VAD() # Initialize your VAD instance
)
session = AgentSession(
stt=fallback_stt,
vad=Silero_VAD(), # Provide the same or a separate VAD instance
# Other configurations like llm and tts
)
If anything’s unclear or you want to dive deeper, just let me know—I’m here to help!
Sources: tests/test_stt_fallback.py | Turn detection and interruptions | LiveKit Docs | Agents Core Architecture | livekit/livekit_composite | DeepWiki