We are using Gemini 2 0 Flash Live API via Vertex AI on Live LiveKit Community #ask-ai

We are using Gemini 2.0 Flash Live API via Vertex ...

adorable-action-31705

08/29/2025, 2:05 AM

We are using Gemini 2.0 Flash Live API via Vertex AI on LiveKit. We have tested following settings on the model, but they do not seem to work the way they are supposed to. Can you (1) verify if LiveKit supports these? and (2) explain how each of these should work? • agent_false_interruption_timeout • min_endpointing_delay • max_endpointing_delay • min_consecutive_speech_delay • min_interruption_duration

tall-airline-16756

08/29/2025, 2:05 AM

Thanks for reaching out, Boram. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/29/2025, 2:06 AM

Hey there! Just wanted to share some info about the settings LiveKit offers for conversation flow and turn detection: • agent_false_interruption_timeout: This is the time (default 4.0 seconds) after which an event fires if the user starts talking but no transcript comes through. If you set it to

None

, this feature gets turned off. • min_endpointing_delay: The minimum wait time (default 0.4–0.5 seconds) after a possible end-of-utterance before the user’s turn ends. • max_endpointing_delay: The maximum wait time (default 6.0 seconds) before ending the user’s turn when confidence is low. • min_consecutive_speech_delay: The shortest gap (default 0.0 seconds) allowed between back-to-back speech segments. • min_interruption_duration: The minimum speech length (default 0.5 seconds) needed to count as an intentional interruption. You can tweak all these settings through the

AgentSession

constructor, and they’ll help shape how turn detection works locally. Now, if you’re using Gemini 2.0 Flash Live API with Vertex AI, LiveKit hooks in via the

google.beta.realtime.RealtimeModel

class. But heads up—Gemini has its own turn detection built right in. If you want to use LiveKit’s turn detection settings instead, you’ll need to turn off Gemini’s automatic activity detection and set up LiveKit’s turn detection model on its own. Here’s how: 1. Set

automatic_activity_detection.disabled = True

realtime_input_config

. 2. Provide a separate STT model (like Deepgram) inside your

AgentSession

. 3. Configure

turn_detection

to use a LiveKit turn detection model, such as

MultilingualModel()

adorable-action-31705

08/29/2025, 2:15 AM

Explain to me in detail how it would work if I enable both of the following settings: • min_interruption_duration: 1.2 • min_interruption_words: 2 Would they apply as and or or? And in what order?

tall-airline-16756

08/29/2025, 2:16 AM

Hey there! Just wanted to explain how the settings min_interruption_duration and min_interruption_words work together—they actually team up in an AND way. That means both need to be true for an interruption to count. Here’s the scoop: 1. min_interruption_duration: First, the system checks if the speech lasts at least 1.2 seconds. If it’s shorter, it skips the interruption. 2. min_interruption_words: If the duration check passes, then it looks at whether there are at least 2 words spoken. If not, the interruption is ignored. So, both steps happen one after the other, and only if both are met does the interruption go through. If you want me to walk you through it more or have any questions, just let me know—I’m here to help! Sources: livekit-agents/livekit/agents/voice/speech_handle.py | Turn detection and interruptions | LiveKit Docs

adorable-action-31705

08/29/2025, 2:48 AM

Can you explain agent_false_interruption_timeout in more detail? Exactly what happens if we turn it on?

tall-airline-16756

08/29/2025, 2:48 AM

The agent_false_interruption_timeout setting in LiveKit is here to help when someone starts talking but no transcript comes through. Here’s a quick rundown of how it works: • When a user begins speaking, the system kicks off a timer based on the agent_false_interruption_timeout value (which is 4.0 seconds by default). • If no transcript shows up within that time—maybe because of background noise or the speech wasn’t clear—the system sends out an

agent_false_interruption

event. • You can use this event to do things like ask the user to repeat themselves or keep track of these interruptions for later review. • If you set this to

None

, it turns off this feature, so no timeout-based interruption handling will happen. Feel free to ask if you want me to explain anything further or help you set this up!

adorable-action-31705

08/29/2025, 2:54 AM

So if we turn agent_false_interruption_timeout on but do not do anything with the event, then there is no real impact? We need to use the event to do something on our end?

tall-airline-16756

08/29/2025, 2:54 AM

That's right! If you turn on agent_false_interruption_timeout but don't handle the

agent_false_interruption

event, it won't really change how the system works. This event is just a heads-up for your app, so you'll need to add some logic yourself if you want to do something specific—like nudging the user or keeping a log. Feel free to reach out if you'd like a hand with handling this event smoothly!

adorable-action-31705

08/29/2025, 2:56 AM

• min_endpointing_delay: We set it to 5, to see if it would work as intended, but it didn't wait 5 seconds. What could be the problem?

tall-airline-16756

08/29/2025, 2:56 AM

Could you share a bit more about your setup? For example: • Are you using LiveKit’s turn detection model, or is Gemini’s built-in turn detection turned on? • Have you set up

min_endpointing_delay

in the

AgentSession

settings? • Are there any other turn detection settings you’ve tweaked that might be affecting this? Knowing these details will really help us figure out why the setting didn’t work as expected.

adorable-action-31705

08/29/2025, 3:05 AM

We're using Gemini’s built-in turn detection and set up

min_endpointing_delay

in the

AgentSession

settings. Following is how I tested this: agent_false_interruption_timeout null allow_interruptions true discard_audio_if_uninterruptible nul_l_ max_endpointing_delay 8 max_tool_steps null min_consecutive_speech_delay null min_endpointing_delay 5 min_interruption_duration null min_interruption_words null preemptive_generation null turn_detection "realtime_llm" use_tts_aligned_transcript null user_away_timeout 30

tall-airline-16756

08/29/2025, 3:05 AM

Thanks so much for sharing those details! It looks like the issue comes from using Gemini’s built-in turn detection (

turn_detection: "realtime_llm"

). When that’s turned on, settings like

min_endpointing_delay

in the

AgentSession

don’t get applied because Gemini’s own turn detection logic takes over, overriding LiveKit’s local settings. If you want

min_endpointing_delay

to work the way you expect, you’ll need to disable Gemini’s built-in turn detection by setting

automatic_activity_detection.disabled = True

in the

realtime_input_config

. After that, you can set up LiveKit’s turn detection model (like

MultilingualModel

) to handle turn detection locally, which will let

min_endpointing_delay

do its thing. Feel free to reach out if you want a hand adjusting these settings or if you’d like me to explain anything in more detail—I’m here to help!

adorable-action-31705

08/29/2025, 3:06 AM

Does any of the following work when using Gemini's built-in detection? • agent_false_interruption_timeout • min_endpointing_delay • max_endpointing_delay • min_consecutive_speech_delay • min_interruption_duration • min_interruption_words

tall-airline-16756

08/29/2025, 3:07 AM

Hey there! Just a quick heads-up: when you're using Gemini's built-in turn detection, a few settings from LiveKit's

AgentSession

won’t come into play because Gemini’s detection takes the lead. These settings are: • agent_false_interruption_timeout • min_endpointing_delay • max_endpointing_delay • min_consecutive_speech_delay • min_interruption_duration • min_interruption_words These belong to LiveKit’s local turn detection logic, which gets skipped when Gemini’s built-in detection is turned on. If you want to use these settings, you’d just need to turn off Gemini’s automatic activity detection and switch over to LiveKit’s turn detection model. Feel free to reach out if you want a hand with switching to LiveKit’s turn detection or setting these up—I’m here to help!

adorable-action-31705

08/29/2025, 3:21 AM

For turn_detection, what options do we have? How do we test it?

tall-airline-16756

08/29/2025, 3:22 AM

The turn_detection setting in LiveKit helps manage how turns are taken during conversations. Here’s a quick rundown of your options: • "realtime_llm": This uses Gemini’s built-in turn detection logic and is the default when you’re integrating with Gemini through Vertex AI. • "MultilingualModel": This one uses LiveKit’s local turn detection model, which supports multiple languages and lets you fine-tune things with settings like

min_endpointing_delay

and

max_endpointing_delay

. Want to test out turn detection? Here’s how you can do it: 1. Pick the

turn_detection

option you want in your

AgentSession

setup. 2. Run a conversation session and watch how turns are picked up and handled. 3. If you’re using LiveKit’s local model, try tweaking parameters like

min_endpointing_delay

min_interruption_duration

to get it just right. 4. For Gemini’s detection, make sure

automatic_activity_detection

is turned on in the

realtime_input_config

. If you’d like a hand setting this up or testing it out, just let me know—I’m here to help!

adorable-action-31705

08/29/2025, 5:21 AM

@modern-rain-52453

tall-airline-16756

08/29/2025, 5:21 AM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

adorable-action-31705

08/29/2025, 5:25 AM

Can we use Gemini 2.0 Flash Live model and MultilanguageModel at the same time? Or do we need to change the LLM?

Open in Slack

Previous Next