OpenAI Realtime Model is suddenly returning weird responses LiveKit Community #ask-ai

OpenAI Realtime Model is suddenly returning weird ...

mysterious-agent-76276

08/19/2025, 3:57 PM

OpenAI Realtime Model is suddenly returning weird responses in the middle of a conversation. This is an actual response that the LLM returned:

Copy code

'Here's the summary of the changes made in the HTML files:

### File `src/main/webapp/cases/case0041-4.html`
- **Added**:
  - A `div` with class `info` to display an item description.
  - A `div` with class `error` to display an error message.
- **Updated**:
  - The `div` with class `output` now uses `innerText` to display the output instead of `textContent`.
  - Minor formatting changes in the JavaScript block for better readability.

### File `src/main/webapp/cases/case0041-5.html`
- **Added**:
  - A `div` with class `info` to display an item description.
  - A `div` with class `error` to display an error message.
- **Updated**:
  - The `div` with class `output` now uses `innerText` to display the output instead of `textContent`.
  - Minor formatting changes in the JavaScript block for better readability.

In both files:
- The `info` div was added to provide a description of what the test case is doing.
- The `error` div was added to display any errors in a clear and consistent manner.
- The output display now uses `innerText` instead of `textContent` for consistency and to properly handle whitespace.
- Minor formatting improvements and indentation cleanup for better code readability.

Let me know if you need further clarifications or any more changes'

tall-airline-16756

08/19/2025, 3:57 PM

Thanks for reaching out, Anas. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/19/2025, 3:58 PM

Hey there! The unexpected response from the OpenAI Realtime Model might be happening for a few reasons: • Configuration Issues: Sometimes, if the model is set up for audio but you’re getting text instead, it could be due to how the conversation history loads or a mismatch in the mode. • Response Status: If the replies seem incomplete or off-topic, it might be hitting token limits, content filters, or running into system hiccups. • System Limitations: Things like turn detection and transcription delays can throw off the context, which might cause unexpected results. Here are a few things you can try to troubleshoot: 1. Take a look at the response status and any error messages you see. 2. Try to keep the loaded conversation history minimal to avoid defaulting to text-only responses. 3. If you want consistent audio replies, consider using a separate TTS plugin. If you’d like, I’m here to help with more detailed guidance—just let me know! Sources: plugins/openai/src/realtime/realtime_model.ts | OpenAI Realtime API integration guide | LiveKit Docs | Azure OpenAI Realtime API and LiveKit | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | Realtime model integrations | LiveKit Docs

mysterious-agent-76276

08/19/2025, 4:04 PM

is there a way to "shorten" the previous context/conversations when passing a new one?

tall-airline-16756

08/19/2025, 4:05 PM

Absolutely, you can definitely shorten or summarize previous conversations when starting a new one. Here are a few ways to do that: • Truncate Context: You can use a callback like

before_llm_cb

to keep the context manageable—say, by holding onto just the last 15 messages. • Automatic Truncation: Some systems do this for you automatically, trimming the context once it goes over a certain number of messages (like 40). • Selective History Loading: Another option is to load only the most relevant parts of the chat history when you start a session. • Conversation Item Truncation: You can also use APIs to shorten specific conversation items instead of cutting out entire messages. If you'd like, I’d be happy to help you figure out how to put any of these into practice!

mysterious-agent-76276

08/19/2025, 4:31 PM

I decided to go with a non-realtime LLM. How can I make sure the openai.llm.LLM() always returns a json?

tall-airline-16756

08/19/2025, 4:32 PM

To make sure

openai.llm.LLM()

always gives you a JSON response, you can simply use the

response_format

parameter when you call the

chat()

method. Here’s a quick example:

Copy code

from livekit.plugins import openai
from livekit.agents.llm import ChatContext, ChatMessage

llm = openai.LLM(model="gpt-4o-mini")

ctx = ChatContext([
    ChatMessage(
        type="message",
        role="system", 
        content=["You are a helpful assistant that responds in JSON format."]
    ),
    ChatMessage(
        type="message",
        role="user", 
        content=["Generate a user profile"]
    )
])

# Force JSON object response
stream = llm.chat(
    chat_ctx=ctx,
    response_format={"type": "json_object"}
)

This way, you’ll always get the response in JSON format. If you have any questions or want me to walk you through it, just let me know—I’m here to help! Sources: complex-agents/role-playing/generators/item_generator.py | tools/server/README.md | livekit-agents/livekit/agents/llm/_provider_format/openai.py | meditation_assistant/assistant_functions.py

mysterious-agent-76276

08/19/2025, 4:36 PM

currently I'm calling session.generate_reply() in the on_enter method. Do I need to manually call llm.chat() every time I want the agent to generate a response? I thought livekit is supposed to call that on its own

tall-airline-16756

08/19/2025, 4:37 PM

No worries, you don’t have to manually call

llm.chat()

every time. When you use

session.generate_reply()

in the

on_enter

method, LiveKit takes care of calling

llm.chat()

for you to generate a response. This is the default setup, so no extra manual calls are needed unless you want to tweak how the LLM interacts. If you ever want to change how the LLM is called or handle its output differently, you can override the

llm_node

method in your agent class. Just let me know if you’d like some help customizing this—I’m happy to guide you! Sources: Modifying LLM output before TTS | LiveKit Docs | main.py | README.md | telephony/make_call/calling_agent.py

mysterious-agent-76276

08/19/2025, 4:43 PM

can you walk me through how to override the llm_node in my class and call llm.chat() so that I can force it to return a json instead

tall-airline-16756

08/19/2025, 4:44 PM

Hey there! If you want to override

llm_node

in your custom agent class to make sure the responses are always in JSON format, here’s a friendly step-by-step guide: 1. Create a Custom Agent Class: Make a subclass of

Agent

and override the

llm_node

method. 2. Modify the Chat Context: Add a system message that kindly tells the LLM to respond with valid JSON only. 3. Call `llm.chat()`: Use this updated context to call

llm.chat()

and handle the response. Here’s a simple example to get you started:

Copy code

from livekit.agents import Agent, llm
from livekit.agents.voice.agent import ModelSettings
from typing import AsyncIterable

class JSONAgent(Agent):
    async def llm_node(
        self,
        chat_ctx: llm.ChatContext,
        tools: list[llm.FunctionTool | llm.RawFunctionTool],
        model_settings: ModelSettings,
    ) -> AsyncIterable[str]:
        activity = self._get_activity_or_raise()
        assert activity.llm is not None, "llm_node called but no LLM node is available"
        
        # Add JSON enforcement system message
        json_instruction = llm.ChatMessage(
            role="system",
            content="You must respond with valid JSON only. Do not include any text outside of the JSON structure."
        )
        
        # Modify the chat context
        modified_ctx = chat_ctx.copy()
        modified_ctx.messages.insert(0, json_instruction)
        
        # Call llm.chat()
        async with activity.llm.chat(
            chat_ctx=modified_ctx,
            tools=tools,
            tool_choice=model_settings.tool_choice if model_settings else None,
            conn_options=activity.session.conn_options.llm_conn_options
        ) as stream:
            async for chunk in stream:
                yield chunk

And that’s it! This way, your LLM will always stick to JSON responses. If you want a hand with adding this to your project, just give me a shout—I’m here to help! Sources: livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py | complex-agents/role-playing/generators/item_generator.py | Voice pipeline nodes | LiveKit Docs | Pipeline nodes and hooks | LiveKit Docs

Open in Slack

Previous Next