We are using Gemini 2.0 Flash Live API via Vertex ...
# ask-ai
a
We are using Gemini 2.0 Flash Live API via Vertex AI on LiveKit. We have tested following settings on the model, but they do not seem to work the way they are supposed to. Can you (1) verify if LiveKit supports these? and (2) explain how each of these should work? • agent_false_interruption_timeout • min_endpointing_delay • max_endpointing_delay • min_consecutive_speech_delay • min_interruption_duration
t
Thanks for reaching out, Boram. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! Just wanted to share some info about the settings LiveKit offers for conversation flow and turn detection: • agent_false_interruption_timeout: This is the time (default 4.0 seconds) after which an event fires if the user starts talking but no transcript comes through. If you set it to
None
, this feature gets turned off. • min_endpointing_delay: The minimum wait time (default 0.4–0.5 seconds) after a possible end-of-utterance before the user’s turn ends. • max_endpointing_delay: The maximum wait time (default 6.0 seconds) before ending the user’s turn when confidence is low. • min_consecutive_speech_delay: The shortest gap (default 0.0 seconds) allowed between back-to-back speech segments. • min_interruption_duration: The minimum speech length (default 0.5 seconds) needed to count as an intentional interruption. You can tweak all these settings through the
AgentSession
constructor, and they’ll help shape how turn detection works locally. Now, if you’re using Gemini 2.0 Flash Live API with Vertex AI, LiveKit hooks in via the
google.beta.realtime.RealtimeModel
class. But heads up—Gemini has its own turn detection built right in. If you want to use LiveKit’s turn detection settings instead, you’ll need to turn off Gemini’s automatic activity detection and set up LiveKit’s turn detection model on its own. Here’s how: 1. Set
automatic_activity_detection.disabled = True
in
realtime_input_config
. 2. Provide a separate STT model (like Deepgram) inside your
AgentSession
. 3. Configure
turn_detection
to use a LiveKit turn detection model, such as
MultilingualModel()
. If you want, I’m here to help you walk through setting these up or integrating Gemini with LiveKit—just let me know! Sources: Turn detection and interruptions | LiveKit Docs | LiveKit turn detector plugin | LiveKit Docs | Gemini Live API integration guide | LiveKit Docs | tests/test_agent_session.py
a
Explain to me in detail how it would work if I enable both of the following settings: • min_interruption_duration: 1.2 • min_interruption_words: 2 Would they apply as and or or? And in what order?
t
Hey there! Just wanted to explain how the settings min_interruption_duration and min_interruption_words work together—they actually team up in an AND way. That means both need to be true for an interruption to count. Here’s the scoop: 1. min_interruption_duration: First, the system checks if the speech lasts at least 1.2 seconds. If it’s shorter, it skips the interruption. 2. min_interruption_words: If the duration check passes, then it looks at whether there are at least 2 words spoken. If not, the interruption is ignored. So, both steps happen one after the other, and only if both are met does the interruption go through. If you want me to walk you through it more or have any questions, just let me know—I’m here to help! Sources: livekit-agents/livekit/agents/voice/speech_handle.py | Turn detection and interruptions | LiveKit Docs
a
Can you explain agent_false_interruption_timeout in more detail? Exactly what happens if we turn it on?
t
The agent_false_interruption_timeout setting in LiveKit is here to help when someone starts talking but no transcript comes through. Here’s a quick rundown of how it works: • When a user begins speaking, the system kicks off a timer based on the agent_false_interruption_timeout value (which is 4.0 seconds by default). • If no transcript shows up within that time—maybe because of background noise or the speech wasn’t clear—the system sends out an
agent_false_interruption
event. • You can use this event to do things like ask the user to repeat themselves or keep track of these interruptions for later review. • If you set this to
None
, it turns off this feature, so no timeout-based interruption handling will happen. Feel free to ask if you want me to explain anything further or help you set this up!
a
So if we turn agent_false_interruption_timeout on but do not do anything with the event, then there is no real impact? We need to use the event to do something on our end?
t
That's right! If you turn on agent_false_interruption_timeout but don't handle the
agent_false_interruption
event, it won't really change how the system works. This event is just a heads-up for your app, so you'll need to add some logic yourself if you want to do something specific—like nudging the user or keeping a log. Feel free to reach out if you'd like a hand with handling this event smoothly!
a
min_endpointing_delay: We set it to 5, to see if it would work as intended, but it didn't wait 5 seconds. What could be the problem?
t
Could you share a bit more about your setup? For example: • Are you using LiveKit’s turn detection model, or is Gemini’s built-in turn detection turned on? • Have you set up
min_endpointing_delay
in the
AgentSession
settings? • Are there any other turn detection settings you’ve tweaked that might be affecting this? Knowing these details will really help us figure out why the setting didn’t work as expected.
a
We're using Gemini’s built-in turn detection and set up
min_endpointing_delay
in the
AgentSession
settings. Following is how I tested this: agent_false_interruption_timeout null allow_interruptions true discard_audio_if_uninterruptible nul_l_ max_endpointing_delay 8 max_tool_steps null min_consecutive_speech_delay null min_endpointing_delay 5 min_interruption_duration null min_interruption_words null preemptive_generation null turn_detection "realtime_llm" use_tts_aligned_transcript null user_away_timeout 30
t
Thanks so much for sharing those details! It looks like the issue comes from using Gemini’s built-in turn detection (
turn_detection: "realtime_llm"
). When that’s turned on, settings like
min_endpointing_delay
in the
AgentSession
don’t get applied because Gemini’s own turn detection logic takes over, overriding LiveKit’s local settings. If you want
min_endpointing_delay
to work the way you expect, you’ll need to disable Gemini’s built-in turn detection by setting
automatic_activity_detection.disabled = True
in the
realtime_input_config
. After that, you can set up LiveKit’s turn detection model (like
MultilingualModel
) to handle turn detection locally, which will let
min_endpointing_delay
do its thing. Feel free to reach out if you want a hand adjusting these settings or if you’d like me to explain anything in more detail—I’m here to help!
a
Does any of the following work when using Gemini's built-in detection? • agent_false_interruption_timeout • min_endpointing_delay • max_endpointing_delay • min_consecutive_speech_delay • min_interruption_duration • min_interruption_words
t
Hey there! Just a quick heads-up: when you're using Gemini's built-in turn detection, a few settings from LiveKit's
AgentSession
won’t come into play because Gemini’s detection takes the lead. These settings are: • agent_false_interruption_timeoutmin_endpointing_delaymax_endpointing_delaymin_consecutive_speech_delaymin_interruption_durationmin_interruption_words These belong to LiveKit’s local turn detection logic, which gets skipped when Gemini’s built-in detection is turned on. If you want to use these settings, you’d just need to turn off Gemini’s automatic activity detection and switch over to LiveKit’s turn detection model. Feel free to reach out if you want a hand with switching to LiveKit’s turn detection or setting these up—I’m here to help!
a
For turn_detection, what options do we have? How do we test it?
t
The turn_detection setting in LiveKit helps manage how turns are taken during conversations. Here’s a quick rundown of your options: • "realtime_llm": This uses Gemini’s built-in turn detection logic and is the default when you’re integrating with Gemini through Vertex AI. • "MultilingualModel": This one uses LiveKit’s local turn detection model, which supports multiple languages and lets you fine-tune things with settings like
min_endpointing_delay
and
max_endpointing_delay
. Want to test out turn detection? Here’s how you can do it: 1. Pick the
turn_detection
option you want in your
AgentSession
setup. 2. Run a conversation session and watch how turns are picked up and handled. 3. If you’re using LiveKit’s local model, try tweaking parameters like
min_endpointing_delay
or
min_interruption_duration
to get it just right. 4. For Gemini’s detection, make sure
automatic_activity_detection
is turned on in the
realtime_input_config
. If you’d like a hand setting this up or testing it out, just let me know—I’m here to help!
a
@modern-rain-52453
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
a
Can we use Gemini 2.0 Flash Live model and MultilanguageModel at the same time? Or do we need to change the LLM?