I’m using the standard LLM integration (not Realti...
# ask-ai
g
I’m using the standard LLM integration (not Realtime) with
gpt-4o
via
AgentSession
, but images via
ImageContent
still aren’t reaching OpenAI, when images are attached at end of turn. Here’s my setup:
Copy code
python
# Using AgentSession with standard LLM (not Realtime)
session = AgentSession(
    stt=deepgram.STT(),
    llm=openai.LLM(model="gpt-4o"),  # Standard LLM, not Realtime
    tts=openai.TTS()
)

# In on_user_turn_completed:
async def on_user_turn_completed(self, ctx: ChatContext, msg: ChatMessage):
    # msg contains ImageContent items
    <http://logger.info|logger.info>(f"Message has {len([c for c in msg.content if isinstance(c, ImageContent)])} images")
    # Logs show: "Message has 1 images"
    
    # But OpenAI debug logs show no image_url in the actual API request
With LK_OPENAI_DEBUG=1, I can see the chat context being sent, but it only contains text content - no image_url fields. Images were never included in the request to OpenAI, even though logs showed them being attached at end of turn, as intended. I then made only one change — setting
preemptive_generation=false
— and then images started making it to OpenAI and agent was able to describe what was showing in the device’s camera during a call. It appears there’s some race happening when preemptive_engeration is enabled, where the message with the image attached at end of turn never supersedes whatever was sent to OpenAI for preemptive generation. Am I doing something wrong with my implementation that causes vision to totally fail when preemptive_generation is enabled, or is this a bug in part of the LK stack?
t
Thanks for reaching out, Dan. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I understand the challenge you're facing with
ImageContent
not reaching OpenAI when
preemptive_generation
is enabled, and I’m here to help you through it. This happens because when
preemptive_generation
is turned on, the system starts creating responses right after receiving the user transcript, using whatever chat context is available at that moment. So, if
ImageContent
is added later (like at the end of the turn), it unfortunately won’t be part of that initial speculative request. That said, the preemptive response should ideally be invalidated and replaced with a new one that includes the full, updated chat context—including your
ImageContent
. If this replacement isn’t happening as it should, it might point to a synchronization hiccup in your implementation. To make sure your images are included, you could try updating the chat context with
ImageContent
before the speculative generation kicks off, or alternatively, consider turning off
preemptive_generation
if syncing everything immediately isn’t possible right now. Please feel free to reach out if you want me to walk you through this setup or explore other options together—I’m happy to assist! Sources: livekit-agents/livekit/agents/voice/speech_handle.py | Agent speech and audio | LiveKit Docs | Agent speech and audio | LiveKit Docs | agent/main.py | complex-agents/vision/agent.py