can you tell me what I should change in my `AgentS...
# ask-ai
m
can you tell me what I should change in my
AgentSession
to achieve the following: 1. Ultra-low latency in transcription. I want to be able to detect the transcription as quickly as possible using hooks. 2. STT accuracy. I want it to support English fully, but with as good accuracy as possible. 3. Do not replace the realtime model.
Copy code
session: AgentSession = AgentSession(
    llm=openai.realtime.RealtimeModel(
        model="gpt-4o-realtime-preview",
        modalities=["text"],
        input_audio_transcription=InputAudioTranscription(
            model="whisper-1",
            language="en",
        ),
    ),
    tts=CustomTTS(),
    vad=silero.VAD.load(
        min_speech_duration=0.1,
        activation_threshold=0.5,
        sample_rate=16000,
    ),
    min_interruption_duration=0.3,
    agent_false_interruption_timeout=2.0,
    preemptive_generation=False,
    allow_interruptions=True,
)
t
Thanks for reaching out, Anas. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To help you achieve ultra-low latency in transcription, high STT accuracy for English, and keep the realtime model running smoothly without replacement, here are some friendly suggestions for adjusting your
AgentSession
configuration: • Use a high-performance STT provider: Consider swapping your current STT model for a low-latency option like Cartesia's Ink-Whisper model, which is specially optimized for real-time transcription. For example:
stt=cartesia.STT(model="ink-whisper", language="en")
Enable preemptive generation: Try setting
preemptive_generation=True
in your session. This helps reduce the perceived latency by overlapping transcription and response generation, making things feel quicker. • Optimize turn detection: Using server-side VAD with the realtime model can speed up speech detection. Here’s an example:
Copy code
turn_detection=SessionTurnDetection(
    type="server_vad",
    silence_duration_ms=350,
    threshold=0.5,
)
Adjust voice options: To get faster response times, make sure to minimize endpointing delay like this:
Copy code
voice_options=VoiceOptions(
    min_endpointing_delay=0.4,
    max_endpointing_delay=6.0,
    allow_interruptions=True,
)
Deploy close to services: Placing your deployment near the LLM, STT, and TTS services can really help cut down network latency. With these tweaks, you should see lower latency and high accuracy while keeping your realtime model intact. If you’d like, I’m here to help you with implementation or testing—just let me know! Sources: Live Conversations with AI using ChatGPT and WebRTC | Cartesia STT integration guide | LiveKit Docs | Cartesia STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/s... | examples/src/cartersia_tts.ts