mysterious-agent-76276
09/10/2025, 8:40 AMAgentSession
to achieve the following:
1. Ultra-low latency in transcription. I want to be able to detect the transcription as quickly as possible using hooks.
2. STT accuracy. I want it to support English fully, but with as good accuracy as possible.
3. Do not replace the realtime model.
session: AgentSession = AgentSession(
llm=openai.realtime.RealtimeModel(
model="gpt-4o-realtime-preview",
modalities=["text"],
input_audio_transcription=InputAudioTranscription(
model="whisper-1",
language="en",
),
),
tts=CustomTTS(),
vad=silero.VAD.load(
min_speech_duration=0.1,
activation_threshold=0.5,
sample_rate=16000,
),
min_interruption_duration=0.3,
agent_false_interruption_timeout=2.0,
preemptive_generation=False,
allow_interruptions=True,
)
tall-airline-16756
09/10/2025, 8:40 AMtall-airline-16756
09/10/2025, 8:41 AMAgentSession
configuration:
• Use a high-performance STT provider: Consider swapping your current STT model for a low-latency option like Cartesia's Ink-Whisper model, which is specially optimized for real-time transcription. For example: stt=cartesia.STT(model="ink-whisper", language="en")
• Enable preemptive generation: Try setting preemptive_generation=True
in your session. This helps reduce the perceived latency by overlapping transcription and response generation, making things feel quicker.
• Optimize turn detection: Using server-side VAD with the realtime model can speed up speech detection. Here’s an example: turn_detection=SessionTurnDetection(
type="server_vad",
silence_duration_ms=350,
threshold=0.5,
)
• Adjust voice options: To get faster response times, make sure to minimize endpointing delay like this: voice_options=VoiceOptions(
min_endpointing_delay=0.4,
max_endpointing_delay=6.0,
allow_interruptions=True,
)
• Deploy close to services: Placing your deployment near the LLM, STT, and TTS services can really help cut down network latency.
With these tweaks, you should see lower latency and high accuracy while keeping your realtime model intact. If you’d like, I’m here to help you with implementation or testing—just let me know! Sources: Live Conversations with AI using ChatGPT and WebRTC | Cartesia STT integration guide | LiveKit Docs | Cartesia STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/s... | examples/src/cartersia_tts.ts