OpenAI Realtime Model is suddenly returning weird ...
# ask-ai
m
OpenAI Realtime Model is suddenly returning weird responses in the middle of a conversation. This is an actual response that the LLM returned:
Copy code
'Here's the summary of the changes made in the HTML files:

### File `src/main/webapp/cases/case0041-4.html`
- **Added**:
  - A `div` with class `info` to display an item description.
  - A `div` with class `error` to display an error message.
- **Updated**:
  - The `div` with class `output` now uses `innerText` to display the output instead of `textContent`.
  - Minor formatting changes in the JavaScript block for better readability.

### File `src/main/webapp/cases/case0041-5.html`
- **Added**:
  - A `div` with class `info` to display an item description.
  - A `div` with class `error` to display an error message.
- **Updated**:
  - The `div` with class `output` now uses `innerText` to display the output instead of `textContent`.
  - Minor formatting changes in the JavaScript block for better readability.

In both files:
- The `info` div was added to provide a description of what the test case is doing.
- The `error` div was added to display any errors in a clear and consistent manner.
- The output display now uses `innerText` instead of `textContent` for consistency and to properly handle whitespace.
- Minor formatting improvements and indentation cleanup for better code readability.

Let me know if you need further clarifications or any more changes'
t
Thanks for reaching out, Anas. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! The unexpected response from the OpenAI Realtime Model might be happening for a few reasons: • Configuration Issues: Sometimes, if the model is set up for audio but you’re getting text instead, it could be due to how the conversation history loads or a mismatch in the mode. • Response Status: If the replies seem incomplete or off-topic, it might be hitting token limits, content filters, or running into system hiccups. • System Limitations: Things like turn detection and transcription delays can throw off the context, which might cause unexpected results. Here are a few things you can try to troubleshoot: 1. Take a look at the response status and any error messages you see. 2. Try to keep the loaded conversation history minimal to avoid defaulting to text-only responses. 3. If you want consistent audio replies, consider using a separate TTS plugin. If you’d like, I’m here to help with more detailed guidance—just let me know! Sources: plugins/openai/src/realtime/realtime_model.ts | OpenAI Realtime API integration guide | LiveKit Docs | Azure OpenAI Realtime API and LiveKit | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | Realtime model integrations | LiveKit Docs
m
is there a way to "shorten" the previous context/conversations when passing a new one?
t
Absolutely, you can definitely shorten or summarize previous conversations when starting a new one. Here are a few ways to do that: • Truncate Context: You can use a callback like
before_llm_cb
to keep the context manageable—say, by holding onto just the last 15 messages. • Automatic Truncation: Some systems do this for you automatically, trimming the context once it goes over a certain number of messages (like 40). • Selective History Loading: Another option is to load only the most relevant parts of the chat history when you start a session. • Conversation Item Truncation: You can also use APIs to shorten specific conversation items instead of cutting out entire messages. If you'd like, I’d be happy to help you figure out how to put any of these into practice!
m
I decided to go with a non-realtime LLM. How can I make sure the openai.llm.LLM() always returns a json?
t
To make sure
openai.llm.LLM()
always gives you a JSON response, you can simply use the
response_format
parameter when you call the
chat()
method. Here’s a quick example:
Copy code
from livekit.plugins import openai
from livekit.agents.llm import ChatContext, ChatMessage

llm = openai.LLM(model="gpt-4o-mini")

ctx = ChatContext([
    ChatMessage(
        type="message",
        role="system", 
        content=["You are a helpful assistant that responds in JSON format."]
    ),
    ChatMessage(
        type="message",
        role="user", 
        content=["Generate a user profile"]
    )
])

# Force JSON object response
stream = llm.chat(
    chat_ctx=ctx,
    response_format={"type": "json_object"}
)
This way, you’ll always get the response in JSON format. If you have any questions or want me to walk you through it, just let me know—I’m here to help! Sources: complex-agents/role-playing/generators/item_generator.py | tools/server/README.md | livekit-agents/livekit/agents/llm/_provider_format/openai.py | meditation_assistant/assistant_functions.py
m
currently I'm calling session.generate_reply() in the on_enter method. Do I need to manually call llm.chat() every time I want the agent to generate a response? I thought livekit is supposed to call that on its own
t
No worries, you don’t have to manually call
llm.chat()
every time. When you use
session.generate_reply()
in the
on_enter
method, LiveKit takes care of calling
llm.chat()
for you to generate a response. This is the default setup, so no extra manual calls are needed unless you want to tweak how the LLM interacts. If you ever want to change how the LLM is called or handle its output differently, you can override the
llm_node
method in your agent class. Just let me know if you’d like some help customizing this—I’m happy to guide you! Sources: Modifying LLM output before TTS | LiveKit Docs | main.py | README.md | telephony/make_call/calling_agent.py
m
can you walk me through how to override the llm_node in my class and call llm.chat() so that I can force it to return a json instead
t
Hey there! If you want to override
llm_node
in your custom agent class to make sure the responses are always in JSON format, here’s a friendly step-by-step guide: 1. Create a Custom Agent Class: Make a subclass of
Agent
and override the
llm_node
method. 2. Modify the Chat Context: Add a system message that kindly tells the LLM to respond with valid JSON only. 3. Call `llm.chat()`: Use this updated context to call
llm.chat()
and handle the response. Here’s a simple example to get you started:
Copy code
from livekit.agents import Agent, llm
from livekit.agents.voice.agent import ModelSettings
from typing import AsyncIterable

class JSONAgent(Agent):
    async def llm_node(
        self,
        chat_ctx: llm.ChatContext,
        tools: list[llm.FunctionTool | llm.RawFunctionTool],
        model_settings: ModelSettings,
    ) -> AsyncIterable[str]:
        activity = self._get_activity_or_raise()
        assert activity.llm is not None, "llm_node called but no LLM node is available"
        
        # Add JSON enforcement system message
        json_instruction = llm.ChatMessage(
            role="system",
            content="You must respond with valid JSON only. Do not include any text outside of the JSON structure."
        )
        
        # Modify the chat context
        modified_ctx = chat_ctx.copy()
        modified_ctx.messages.insert(0, json_instruction)
        
        # Call llm.chat()
        async with activity.llm.chat(
            chat_ctx=modified_ctx,
            tools=tools,
            tool_choice=model_settings.tool_choice if model_settings else None,
            conn_options=activity.session.conn_options.llm_conn_options
        ) as stream:
            async for chunk in stream:
                yield chunk
And that’s it! This way, your LLM will always stick to JSON responses. If you want a hand with adding this to your project, just give me a shout—I’m here to help! Sources: livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py | complex-agents/role-playing/generators/item_generator.py | Voice pipeline nodes | LiveKit Docs | Pipeline nodes and hooks | LiveKit Docs