Is it possible to improve Realtime Model VAD in ge...
# ask-ai
f
Is it possible to improve Realtime Model VAD in gemini?
t
Thanks for reaching out, Baymurat. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To improve the Realtime Model VAD in Gemini, you can configure it to use LiveKit's turn detection model for more accurate results. Here's how: 1. Disable Server-Side VAD: Set
automatic_activity_detection
to
disabled
in the
RealtimeInputConfig
to turn off Gemini's default VAD. 2. Configure LiveKit's Turn Detection: Use a compatible turn detection model, such as
MultilingualModel
, by setting it in the
turn_detection
parameter of
AgentSession
. 3. Provide STT for Transcription: Since Gemini doesn't provide interim transcripts, use a separate STT plugin like Deepgram to feed transcription data to the turn detection model. Here's an example configuration:
Copy code
from google.genai import types
from livekit.agents import AgentSession
from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
    turn_detection=MultilingualModel(),
    llm=google.beta.realtime.RealtimeModel(
        realtime_input_config=types.RealtimeInputConfig(
            automatic_activity_detection=types.AutomaticActivityDetection(
                disabled=True,
            ),
        ),
        input_audio_transcription=None,
    ),
    stt=deepgram.STT(),
)
This setup disables Gemini's VAD and enables LiveKit's turn detection model to handle end-of-turn detection using the provided STT transcription.
f
I wonder how many languages does this support? MultilingualModel? I will be using it for Thai langauge
t
Hey there! The MultilingualModel in LiveKit is pretty awesome—it supports English plus 13 other languages like Chinese, Dutch, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. That said, it doesn’t support Thai just yet. If you’re looking to work with Thai, you might want to check out some specific STT plugins or other models that are designed for Thai transcription. I’m here if you want help exploring those options! Sources: LiveKit turn detector plugin | LiveKit Docs | LiveKit turn detector plugin | LiveKit Docs
f
what STT do you recommend?
t
Hey there! Thai language ("th") is indeed listed as a supported language for STT in LiveKit. That said, the exact STT plugins that specifically support Thai transcription aren’t clearly detailed. I’d suggest checking out different STT providers that work with LiveKit to see which ones handle Thai best. If you need any help with that, just let me know! Sources: livekit-plugins/livekit-plugins-spitch/livekit/plugins/spitch/stt.py
f
what available models do we have for gemini realtime
t
Here are the models available for Gemini Realtime in LiveKit: 1.
gemini-2.0-flash-exp
2.
gemini-2.0-flash-live-001
3.
gemini-live-2.5-flash-preview
4.
gemini-2.5-flash-preview-native-audio-dialog
5.
gemini-2.5-flash-exp-native-audio-thinking-dialog
Just a heads-up: the 2.5 models are still in preview, so they might have a few quirks, like some function calling issues. If you want to dive deeper into any specific model or how to use them, feel free to ask—I’m here to help! Sources: Gemini Live API integration guide | LiveKit Docs
f
how this one`gemini-2.5-flash-exp-native-audio-thinking-dialog` is better tahn
gemini-2.5-flash-exp-native-audio-dialog
?
t
Both
gemini-2.5-flash-exp-native-audio-thinking-dialog
and
gemini-2.5-flash-exp-native-audio-dialog
are available models, but there aren't specific details about how they differ. If you have a particular use case in mind, I’d suggest giving both a try to see which one works best for you. I’m here to help if you need any guidance along the way! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py
f
Can I disable interruption in gemini realtime?
t
Absolutely, you can disable interruption in Gemini Realtime by using the
allow_interruptions
parameter. Here’s a quick rundown on how to do it: 1. In JavaScript/TypeScript: Just set
allowInterruptions
to
false
in
VPAOptions
when you’re setting up the voice pipeline agent. 2. In Python: When you initialize
AgentSession
, configure it with
allow_interruptions=False
. 3. Manual Control: If you ever need to stop the agent’s current response on the fly, you can use
session.interrupt()
. Just a heads-up: If you’re working with a RealtimeModel that has built-in turn detection, the
allow_interruptions
parameter might not have any effect. It’s a good idea to double-check your model’s features to make sure everything works smoothly for your needs. Sources: VPAOptions | LiveKit Agents | livekit-agents/livekit/agents/voice/agent_session.py | Configuring turn detection and interruptions | LiveKit Docs | Turn detection and interruptions | LiveKit Docs
f
return AgentSession( llm=google.beta.realtime.RealtimeModel( model=lang_config.llm.model or "gemini-2.5-flash-preview-native-audio-dialog", voice=voice_name, temperature=0.8, instructions=full_instructions, modalities=["AUDIO"], language=stt_language, enable_affective_dialog=True, proactivity=True, # Disable server-side turn detection ), allow_interruptions=False, # No vad, stt, or tts needed for Gemini realtime ) But here I am having this error: -06 114016 [ ERROR] livekit.agents: Error in _update_activity_task Traceback (most recent call last): File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 683, in _update_activity_task self._next_activity = AgentActivity(task, self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_activity.py", line 103, in init raise ValueError( ValueError: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead 2025-08-06 114016,690 - ERROR livekit.agents - Error in _update_activity_task Traceback (most recent call last): File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 683, in _update_activity_task self._next_activity = AgentActivity(task, self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_activity.py", line 103, in init raise ValueError( ValueError: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead 2025-08-06 114016 [ DEBUG] livekit.agents: start reading stream 2025-08-06 114016,847 - DEBUG livekit.agents - start reading stream {"participant": "user-anonymous", "source": "SOURCE_MICROPHONE"} 2025-08-06 114016 [ ERROR] __mp_main__: Agent initialization failed: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead 2025-08-06 114016,847 - ERROR __mp_main__ - Agent initialization failed: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead 2025-08-06 114016 [ ERROR] livekit.agents: unhandled exception while running the job task Traceback (most recent call last): File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 892, in entrypoint await agent.run(ctx) File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 578, in run await self._start_session(session, ctx, agent) File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 844, in _start_session await session.start( File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 424, in start await asyncio.gather(*tasks) File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 683, in _update_activity_task self._next_activity = AgentActivity(task, self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_activity.py", line 103, in init raise ValueError( ValueError: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead 2025-08-06 114016,848 - ERROR livekit.agents - unhandled exception while running the job task Traceback (most recent call last): File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 892, in entrypoint await agent.run(ctx) File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 578, in run await self._start_session(session, ctx, agent) File "/Users/baymuratabdumuratov/Documents/aicoach-backend/livekit-service/livekit-agent.py", line 844, in _start_session await session.start( File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 424, in start await asyncio.gather(*tasks) File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_session.py", line 683, in _update_activity_task self._next_activity = AgentActivity(task, self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/livekit/agents/voice/agent_activity.py", line 103, in init raise ValueError( ValueError: the RealtimeModel uses a server-side turn detection, allow_interruptions cannot be False, disable turn_detection in the RealtimeModel and use VAD on the AgentSession instead
t
Hey there! The error you're seeing happens because the RealtimeModel uses server-side turn detection, which doesn't play well with setting
allow_interruptions=False
. No worries though—we can fix this by turning off server-side turn detection and setting up VAD for your AgentSession. Here's a simple way to do it: 1. Turn Off Turn Detection in RealtimeModel: Just set
turn_detection=None
in your RealtimeModel configuration. Like this:
Copy code
llm = google.beta.realtime.RealtimeModel(
    realtime_input_config=types.RealtimeInputConfig(
        automatic_activity_detection=types.AutomaticActivityDetection(
            disabled=True,
        ),
    ),
    input_audio_transcription=None,
)
1. Switch On VAD in AgentSession: Use a VAD plugin such as Silero and set
allow_interruptions=False
. Here's an example:
Copy code
from livekit.plugins import silero

session = AgentSession(
    allow_interruptions=False,
    llm=llm,  # RealtimeModel with turn_detection=None
    vad=silero.VAD.load(),
)
1. Optional - Add a Turn Detection Model: If you want more advanced turn detection, you can use LiveKit's turn detection model like this:
Copy code
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from livekit.plugins import deepgram, silero

session = AgentSession(
    allow_interruptions=False,
    turn_detection=MultilingualModel(),
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=llm,
)
This setup moves turn detection to the client side using VAD, which clears up the conflict and lets you disable interruptions smoothly. If you have any questions or want to chat more about this, just let me know! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | Turn detection and interruptions | LiveKit Docs
f
I tried this, but it did not work out: llm=google.beta.realtime.RealtimeModel( model=lang_config.llm.model or "gemini-2.5-flash-preview-native-audio-dialog", voice=voice_name, temperature=0.8, instructions=full_instructions, modalities=["AUDIO"], language=stt_language, enable_affective_dialog=True, proactivity=True, # Disable server-side turn detection realtime_input_config=types.RealtimeInputConfig( automatic_activity_detection=types.AutomaticActivityDetection( disabled=True, ), ), ), vad=silero.VAD.load(), allow_interruptions=False, # No vad, stt, or tts needed for Gemini realtime )
t
Hey there! It looks like the issue is happening because the Gemini RealtimeModel still has some rules it follows, even when server-side turn detection is turned off. Specifically, setting
allow_interruptions=False
doesn’t quite fit with what the model needs. No worries though—here’s a simple way to fix it: 1. Set `allow_interruptions=True`: If you’re using Gemini Live with client-side turn detection, you’ll want to allow interruptions so the client-side VAD and STT can handle things smoothly. 2. Make sure VAD and STT are set up right: Use client-side VAD and STT to manage interruptions and turn detection. For example:
Copy code
session = AgentSession(
    turn_detection=MultilingualModel(),
    llm=google.beta.realtime.RealtimeModel(
        realtime_input_config=types.RealtimeInputConfig(
            automatic_activity_detection=types.AutomaticActivityDetection(
                disabled=True,
            ),
        ),
        input_audio_transcription=None,
    ),
    stt=deepgram.STT(),
)
This setup turns off server-side turn detection and relies on client-side VAD and STT to handle it instead. If anything’s unclear or you want a hand with this, just let me know—I’m here to help! Sources: Gemini Live API integration guide | LiveKit Docs | livekit-agents/livekit/agents/voice/agent.py | Turn detection and interruptions | LiveKit Docs | Turn detection | LiveKit Docs