I am using OpenAIs realtime model for STT and LLM When I har LiveKit Community #ask-ai

I am using OpenAIs realtime model for STT and LLM....

mammoth-shoe-93270

08/10/2025, 9:45 AM

I am using OpenAIs realtime model for STT and LLM. When I hardcode the language to english by using

Copy code

input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
                language="en",

I get the following error and only see the agents transcript. The llm is getting the audio and responding but we dont see the user input.

Copy code

phone-agent-uat-1  | 2025-08-10 09:38:33,406 - ERROR livekit.plugins.openai - failed to handle event
phone-agent-uat-1  | Traceback (most recent call last):
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/plugins/openai/realtime/realtime_model.py", line 766, in _recv_task
phone-agent-uat-1  |     self._handle_response_audio_delta(
phone-agent-uat-1  |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
phone-agent-uat-1  |         ResponseAudioDeltaEvent.construct(**event)
phone-agent-uat-1  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
phone-agent-uat-1  |     )
phone-agent-uat-1  |     ^
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/plugins/openai/realtime/realtime_model.py", line 1348, in _handle_response_audio_delta
phone-agent-uat-1  |     item_generation.audio_ch.send_nowait(
phone-agent-uat-1  |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
phone-agent-uat-1  |         rtc.AudioFrame(
phone-agent-uat-1  |         ^^^^^^^^^^^^^^^
phone-agent-uat-1  |     ...<4 lines>...
phone-agent-uat-1  |         )
phone-agent-uat-1  |         ^
phone-agent-uat-1  |     )
phone-agent-uat-1  |     ^
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/agents/utils/aio/channel.py", line 95, in send_nowait
phone-agent-uat-1  |     raise ChanClosed
phone-agent-uat-1  | livekit.agents.utils.aio.channel.ChanClosed {"event": {"type": "response.audio.delta", "event_id": "event_C2wnZzSBi7nQqA7yWEeiy", "response_id": "resp_C2wnYU9h2zQxN2yDE29CC", "item_id": "item_C2wnYvoMdvIBCxAYsiLOe", "output_index": 0, "content_index": 0, "delta": "CAk0B04HKg..."}, "pid": 41, "job_id": "AJ_RJgs9syhCB2p"}

tall-airline-16756

08/10/2025, 9:45 AM

Thanks for reaching out, Matt. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/10/2025, 9:46 AM

Hey there! The

ChanClosed

error usually pops up when an asyncio channel gets closed while trying to send or receive data. This often happens because of hiccups with the Realtime API connection or unexpected session closures. No worries though—here are some friendly steps to help you sort it out: 1. Model Configuration: Double-check that your model name, language, and settings line up with your OpenAI setup. Also, make sure your API key and base URL are spot on. 2. Connection Issues: Take a quick look at your network and firewall settings. Confirm you're using the right base URL for your OpenAI deployment. 3. Session Management: It’s a good idea to have proper session handling in place so your app can reconnect smoothly if the session closes unexpectedly. One more tip: turning on debug logging can give you more clues about what’s going on behind the scenes:

Copy code

import logging
logging.basicConfig(level=logging.DEBUG)

If you’re still stuck after trying these, feel free to share more details about your setup and the exact error message. I’m here to help you get this sorted! Sources: livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/stt.py

mammoth-shoe-93270

08/10/2025, 9:49 AM

There dont seem to be connection issues as the agent is working. I just dont have users transcripts

tall-airline-16756

08/10/2025, 9:49 AM

Hey there! To make sure your user transcripts show up when using OpenAI's realtime model for STT and LLM, let’s double-check a few things together: 1. STT Plugin Configuration: Make sure you’ve got an STT plugin set up that feeds live STT results to the turn detector. This step is key for getting those user transcripts. 2. Language Setting: Double-check that the STT plugin is set to the right language (English, in your case). 3. Turn Detection: Take a look at the turn detection settings in the OpenAI realtime model. Since it uses server-side voice activity detection (VAD), if these settings aren’t quite right, transcripts might not show up. Just a heads-up: the OpenAI realtime model gives you user transcripts after the turn ends, not bit by bit during speech. So, if you want live updates, the STT plugin needs to be properly connected. If you’re still running into trouble, feel free to share your STT plugin setup and turn detection settings with me—I’m here to help! Sources: LiveKit turn detector plugin | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real...

mammoth-shoe-93270

08/10/2025, 9:52 AM

I have it configured like this

Copy code

llm_model = "gpt-4o-realtime-preview"
        llm = openai.realtime.RealtimeModel(
            input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
                language="en",
                prompt="The following is a phone call from a customer/client calling to request a quote or a service."
            ),
            model=llm_model,
            modalities=["text"],  # Text-only to use separate TTS
            temperature=0.8,
            voice="alloy",  # Voice setting (though we're using text mode)
            turn_detection=TurnDetection(
                type="semantic_vad",
                eagerness="auto",
                create_response=True,
                interrupt_response=True,
            ),
        )

tall-airline-16756

08/10/2025, 9:52 AM

Hey there! Just to clarify, your setup with the OpenAI RealtimeModel using semantic VAD and text-only modes shouldn’t be causing any issues with seeing user transcripts. These settings mainly affect how the model listens and responds, but they don’t hide the transcripts. Since you’re not seeing the transcripts, could you please double-check a few things for me? 1. STT Plugin: Make sure the STT plugin is properly connected and set up to give live transcription results. 2. Turn Detection: Check that the turn detection is working as it should. Semantic VAD should be splitting the audio into clear utterances. 3. Logs: Take a look at the logs to see if there are any errors or warnings about transcription or turn detection. If everything looks good but you’re still having trouble, just give me a shout and we’ll dig into it together! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | OpenAI Realtime API integration guide | LiveKit Docs | web/src/components/code-viewer.tsx

mammoth-shoe-93270

08/10/2025, 9:53 AM

The only error I see is the one linked at the start

tall-airline-16756

08/10/2025, 9:55 AM

Hey there! The

ChanClosed

error usually points to some hiccups with internal communication channels or how sessions are managed. Since your agent is up and running but user transcripts are missing, let’s walk through some steps that might help: 1. WebSocket Connection Management: Make sure your RealtimeModel keeps those WebSocket connections steady. It’s a good idea to add connection timeout and retry logic if you haven’t already. 2. Session Duration: Check that your session isn’t running longer than it should, which can cause it to close unexpectedly. Setting a max session duration might do the trick. 3. Error Handling: Set up solid error handling for session events. For example, listen for reconnection attempts and handle recoverable errors smoothly. 4. Semantic VAD Configuration: Tweak the

eagerness

setting in your turn detection to make sure user speech is segmented properly. Something balanced like

medium

auto

usually works best. 5. Debug Logging: Turn on debug logging by setting the

LK_OPENAI_DEBUG

environment variable to

. This will give you more detailed info about what’s happening with the WebSocket communication. If you’re still running into trouble, feel free to share any extra logs or details about your session lifecycle. I’m here to help you get this sorted out! Sources: livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti... | OpenAI Realtime API integration guide | LiveKit Docs

mammoth-shoe-93270

08/10/2025, 10:02 AM

Can I configure input_audio_transcription in text mode?

tall-airline-16756

08/10/2025, 10:02 AM

Absolutely! You can set up

input_audio_transcription

in text mode when using the OpenAI RealtimeModel in LiveKit. Just choose a transcription model that’s text-based, like

whisper-1

, and set the

modalities

["text"]

to keep things text-only. Here’s a quick example to get you started:

Copy code

llm = openai.realtime.RealtimeModel(
    input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
        model="whisper-1",
        language="en"
    ),
    modalities=["text"],
    turn_detection=TurnDetection(
        type="semantic_vad",
        eagerness="auto",
        create_response=True,
        interrupt_response=True
    )
)

If you run into any questions or want a hand setting this up, just give me a shout—I’m here to help! Sources: agents/src/multimodal/multimodal_agent.ts | examples/voice_agents/realtime_turn_detector.py

mammoth-shoe-93270

08/10/2025, 10:04 AM

still getting that same error

tall-airline-16756

08/10/2025, 10:04 AM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

Open in Slack

Previous Next