I am using OpenAIs realtime model for STT and LLM....
# ask-ai
m
I am using OpenAIs realtime model for STT and LLM. When I hardcode the language to english by using
Copy code
input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
                language="en",
I get the following error and only see the agents transcript. The llm is getting the audio and responding but we dont see the user input.
Copy code
phone-agent-uat-1  | 2025-08-10 09:38:33,406 - ERROR livekit.plugins.openai - failed to handle event
phone-agent-uat-1  | Traceback (most recent call last):
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/plugins/openai/realtime/realtime_model.py", line 766, in _recv_task
phone-agent-uat-1  |     self._handle_response_audio_delta(
phone-agent-uat-1  |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
phone-agent-uat-1  |         ResponseAudioDeltaEvent.construct(**event)
phone-agent-uat-1  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
phone-agent-uat-1  |     )
phone-agent-uat-1  |     ^
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/plugins/openai/realtime/realtime_model.py", line 1348, in _handle_response_audio_delta
phone-agent-uat-1  |     item_generation.audio_ch.send_nowait(
phone-agent-uat-1  |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
phone-agent-uat-1  |         rtc.AudioFrame(
phone-agent-uat-1  |         ^^^^^^^^^^^^^^^
phone-agent-uat-1  |     ...<4 lines>...
phone-agent-uat-1  |         )
phone-agent-uat-1  |         ^
phone-agent-uat-1  |     )
phone-agent-uat-1  |     ^
phone-agent-uat-1  |   File "/app/.venv/lib/python3.13/site-packages/livekit/agents/utils/aio/channel.py", line 95, in send_nowait
phone-agent-uat-1  |     raise ChanClosed
phone-agent-uat-1  | livekit.agents.utils.aio.channel.ChanClosed {"event": {"type": "response.audio.delta", "event_id": "event_C2wnZzSBi7nQqA7yWEeiy", "response_id": "resp_C2wnYU9h2zQxN2yDE29CC", "item_id": "item_C2wnYvoMdvIBCxAYsiLOe", "output_index": 0, "content_index": 0, "delta": "CAk0B04HKg..."}, "pid": 41, "job_id": "AJ_RJgs9syhCB2p"}
t
Thanks for reaching out, Matt. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! The
ChanClosed
error usually pops up when an asyncio channel gets closed while trying to send or receive data. This often happens because of hiccups with the Realtime API connection or unexpected session closures. No worries though—here are some friendly steps to help you sort it out: 1. Model Configuration: Double-check that your model name, language, and settings line up with your OpenAI setup. Also, make sure your API key and base URL are spot on. 2. Connection Issues: Take a quick look at your network and firewall settings. Confirm you're using the right base URL for your OpenAI deployment. 3. Session Management: It’s a good idea to have proper session handling in place so your app can reconnect smoothly if the session closes unexpectedly. One more tip: turning on debug logging can give you more clues about what’s going on behind the scenes:
Copy code
import logging
logging.basicConfig(level=logging.DEBUG)
If you’re still stuck after trying these, feel free to share more details about your setup and the exact error message. I’m here to help you get this sorted! Sources: livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/stt.py
m
There dont seem to be connection issues as the agent is working. I just dont have users transcripts
t
Hey there! To make sure your user transcripts show up when using OpenAI's realtime model for STT and LLM, let’s double-check a few things together: 1. STT Plugin Configuration: Make sure you’ve got an STT plugin set up that feeds live STT results to the turn detector. This step is key for getting those user transcripts. 2. Language Setting: Double-check that the STT plugin is set to the right language (English, in your case). 3. Turn Detection: Take a look at the turn detection settings in the OpenAI realtime model. Since it uses server-side voice activity detection (VAD), if these settings aren’t quite right, transcripts might not show up. Just a heads-up: the OpenAI realtime model gives you user transcripts after the turn ends, not bit by bit during speech. So, if you want live updates, the STT plugin needs to be properly connected. If you’re still running into trouble, feel free to share your STT plugin setup and turn detection settings with me—I’m here to help! Sources: LiveKit turn detector plugin | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real...
m
I have it configured like this
Copy code
llm_model = "gpt-4o-realtime-preview"
        llm = openai.realtime.RealtimeModel(
            input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
                language="en",
                prompt="The following is a phone call from a customer/client calling to request a quote or a service."
            ),
            model=llm_model,
            modalities=["text"],  # Text-only to use separate TTS
            temperature=0.8,
            voice="alloy",  # Voice setting (though we're using text mode)
            turn_detection=TurnDetection(
                type="semantic_vad",
                eagerness="auto",
                create_response=True,
                interrupt_response=True,
            ),
        )
t
Hey there! Just to clarify, your setup with the OpenAI RealtimeModel using semantic VAD and text-only modes shouldn’t be causing any issues with seeing user transcripts. These settings mainly affect how the model listens and responds, but they don’t hide the transcripts. Since you’re not seeing the transcripts, could you please double-check a few things for me? 1. STT Plugin: Make sure the STT plugin is properly connected and set up to give live transcription results. 2. Turn Detection: Check that the turn detection is working as it should. Semantic VAD should be splitting the audio into clear utterances. 3. Logs: Take a look at the logs to see if there are any errors or warnings about transcription or turn detection. If everything looks good but you’re still having trouble, just give me a shout and we’ll dig into it together! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | OpenAI Realtime API integration guide | LiveKit Docs | web/src/components/code-viewer.tsx
m
The only error I see is the one linked at the start
t
Hey there! The
ChanClosed
error usually points to some hiccups with internal communication channels or how sessions are managed. Since your agent is up and running but user transcripts are missing, let’s walk through some steps that might help: 1. WebSocket Connection Management: Make sure your RealtimeModel keeps those WebSocket connections steady. It’s a good idea to add connection timeout and retry logic if you haven’t already. 2. Session Duration: Check that your session isn’t running longer than it should, which can cause it to close unexpectedly. Setting a max session duration might do the trick. 3. Error Handling: Set up solid error handling for session events. For example, listen for reconnection attempts and handle recoverable errors smoothly. 4. Semantic VAD Configuration: Tweak the
eagerness
setting in your turn detection to make sure user speech is segmented properly. Something balanced like
medium
or
auto
usually works best. 5. Debug Logging: Turn on debug logging by setting the
LK_OPENAI_DEBUG
environment variable to
1
. This will give you more detailed info about what’s happening with the WebSocket communication. If you’re still running into trouble, feel free to share any extra logs or details about your session lifecycle. I’m here to help you get this sorted out! Sources: livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti... | OpenAI Realtime API integration guide | LiveKit Docs
m
Can I configure input_audio_transcription in text mode?
t
Absolutely! You can set up
input_audio_transcription
in text mode when using the OpenAI RealtimeModel in LiveKit. Just choose a transcription model that’s text-based, like
whisper-1
, and set the
modalities
to
["text"]
to keep things text-only. Here’s a quick example to get you started:
Copy code
llm = openai.realtime.RealtimeModel(
    input_audio_transcription=openai.realtime.realtime_model.InputAudioTranscription(
        model="whisper-1",
        language="en"
    ),
    modalities=["text"],
    turn_detection=TurnDetection(
        type="semantic_vad",
        eagerness="auto",
        create_response=True,
        interrupt_response=True
    )
)
If you run into any questions or want a hand setting this up, just give me a shout—I’m here to help! Sources: agents/src/multimodal/multimodal_agent.ts | examples/voice_agents/realtime_turn_detector.py
m
still getting that same error
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!