glamorous-architect-23034
09/10/2025, 4:19 PMgpt-4o
via AgentSession
, but images via ImageContent
still aren’t reaching OpenAI, when images are attached at end of turn. Here’s my setup:
python
# Using AgentSession with standard LLM (not Realtime)
session = AgentSession(
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"), # Standard LLM, not Realtime
tts=openai.TTS()
)
# In on_user_turn_completed:
async def on_user_turn_completed(self, ctx: ChatContext, msg: ChatMessage):
# msg contains ImageContent items
<http://logger.info|logger.info>(f"Message has {len([c for c in msg.content if isinstance(c, ImageContent)])} images")
# Logs show: "Message has 1 images"
# But OpenAI debug logs show no image_url in the actual API request
With LK_OPENAI_DEBUG=1, I can see the chat context being sent, but it only contains text content - no image_url fields.
Images were never included in the request to OpenAI, even though logs showed them being attached at end of turn, as intended.
I then made only one change — setting preemptive_generation=false
— and then images started making it to OpenAI and agent was able to describe what was showing in the device’s camera during a call.
It appears there’s some race happening when preemptive_engeration is enabled, where the message with the image attached at end of turn never supersedes whatever was sent to OpenAI for preemptive generation.
Am I doing something wrong with my implementation that causes vision to totally fail when preemptive_generation is enabled, or is this a bug in part of the LK stack?tall-airline-16756
09/10/2025, 4:19 PMtall-airline-16756
09/10/2025, 4:21 PMImageContent
not reaching OpenAI when preemptive_generation
is enabled, and I’m here to help you through it. This happens because when preemptive_generation
is turned on, the system starts creating responses right after receiving the user transcript, using whatever chat context is available at that moment. So, if ImageContent
is added later (like at the end of the turn), it unfortunately won’t be part of that initial speculative request.
That said, the preemptive response should ideally be invalidated and replaced with a new one that includes the full, updated chat context—including your ImageContent
. If this replacement isn’t happening as it should, it might point to a synchronization hiccup in your implementation.
To make sure your images are included, you could try updating the chat context with ImageContent
before the speculative generation kicks off, or alternatively, consider turning off preemptive_generation
if syncing everything immediately isn’t possible right now. Please feel free to reach out if you want me to walk you through this setup or explore other options together—I’m happy to assist!
Sources: livekit-agents/livekit/agents/voice/speech_handle.py | Agent speech and audio | LiveKit Docs | Agent speech and audio | LiveKit Docs | agent/main.py | complex-agents/vision/agent.py