can you tell me what I should change in my `AgentSession` to LiveKit Community #ask-ai

can you tell me what I should change in my `AgentS...

mysterious-agent-76276

09/10/2025, 8:40 AM

can you tell me what I should change in my

AgentSession

to achieve the following: 1. Ultra-low latency in transcription. I want to be able to detect the transcription as quickly as possible using hooks. 2. STT accuracy. I want it to support English fully, but with as good accuracy as possible. 3. Do not replace the realtime model.

Copy code

session: AgentSession = AgentSession(
    llm=openai.realtime.RealtimeModel(
        model="gpt-4o-realtime-preview",
        modalities=["text"],
        input_audio_transcription=InputAudioTranscription(
            model="whisper-1",
            language="en",
        ),
    ),
    tts=CustomTTS(),
    vad=silero.VAD.load(
        min_speech_duration=0.1,
        activation_threshold=0.5,
        sample_rate=16000,
    ),
    min_interruption_duration=0.3,
    agent_false_interruption_timeout=2.0,
    preemptive_generation=False,
    allow_interruptions=True,
)

tall-airline-16756

09/10/2025, 8:40 AM

Thanks for reaching out, Anas. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

09/10/2025, 8:41 AM

To help you achieve ultra-low latency in transcription, high STT accuracy for English, and keep the realtime model running smoothly without replacement, here are some friendly suggestions for adjusting your

AgentSession

configuration: • Use a high-performance STT provider: Consider swapping your current STT model for a low-latency option like Cartesia's Ink-Whisper model, which is specially optimized for real-time transcription. For example:

stt=cartesia.STT(model="ink-whisper", language="en")

• Enable preemptive generation: Try setting

preemptive_generation=True

in your session. This helps reduce the perceived latency by overlapping transcription and response generation, making things feel quicker. • Optimize turn detection: Using server-side VAD with the realtime model can speed up speech detection. Here’s an example:

Copy code

turn_detection=SessionTurnDetection(
    type="server_vad",
    silence_duration_ms=350,
    threshold=0.5,
)

• Adjust voice options: To get faster response times, make sure to minimize endpointing delay like this:

Copy code

voice_options=VoiceOptions(
    min_endpointing_delay=0.4,
    max_endpointing_delay=6.0,
    allow_interruptions=True,
)

• Deploy close to services: Placing your deployment near the LLM, STT, and TTS services can really help cut down network latency. With these tweaks, you should see lower latency and high accuracy while keeping your realtime model intact. If you’d like, I’m here to help you with implementation or testing—just let me know! Sources: Live Conversations with AI using ChatGPT and WebRTC | Cartesia STT integration guide | LiveKit Docs | Cartesia STT integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-speechmatics/livekit/plugins/speechmatics/s... | examples/src/cartersia_tts.ts

Open in Slack

Previous Next