Why at got-realtime when model speaking it hearing...
# ask-ai
m
Why at got-realtime when model speaking it hearing itself and produce transcripts and think that it is turn
t
Thanks for reaching out, Max. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I completely understand how it might seem a bit confusing when the model appears to "hear itself" and produces transcripts like that. What’s happening is actually an internal mechanism designed to prevent deadlocks during real-time interactions. The system introduces an artificial turn to keep things running smoothly with tool responses and audio generation, which can sometimes make it look like the model is responding to itself. Also, the way the system tracks turns means it sometimes treats certain events—like speculative text or tool output—as the start of the assistant’s turn, even if the previous speech isn’t fully finished yet. This is a thoughtful design choice to help maintain responsiveness during real-time conversations. If you’d like, I’m here to help clarify anything further or assist you with this—just let me know! Sources: Realtime model integrations | LiveKit Docs | LiveKit turn detector plugin | LiveKit Docs | Turn detection and interruptions | LiveKit Docs | livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/realtime...
m
gpt-realtime model and krisp does not help
t
I'm here to help you work through the self-hearing or transcript issues you’re experiencing with the GPT-realtime model and Krisp. Let’s try these steps together: For self-hearing issues: • Make sure
echoCancellation=true
is turned on in your audio capture settings. This really helps prevent echo and avoids those frustrating feedback loops. • Check that the AI agent isn’t subscribing to its own audio track. It should only be picking up audio tracks with a
MICROPHONE
source from other participants, which keeps things clear. For transcript issues: • Consider adding a dedicated STT plugin like Google Cloud STT alongside the GPT-realtime model. This can give you low-latency, streaming transcriptions since the GPT-realtime model’s transcription tends to be delayed and might not work well for real-time display. • Try adjusting turn detection settings, such as
silence_duration_ms
, to make the system more responsive and reduce any false triggers. If you’d like, I’m happy to walk you through implementing these solutions or answer any questions you have. Just let me know! Sources: AI voice agents | LiveKit Docs | Development Tools | livekit/livekit_composite | DeepWiki | Realtime model integrations | LiveKit Docs | Live Conversations with AI using ChatGPT and WebRTC | livekit-agents/livekit/agents/llm/realtime.py