This message was deleted.
# ask-ai
s
This message was deleted.
t
Thanks for reaching out, Anigilaje. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To add vision capabilities to your LiveKit setup, you need to enable video input in the
RoomInputOptions
. Here's how you can modify your code: 1. Set
video_enabled=True
in the
RoomInputOptions
when starting the session. This will enable video input for the room. 2. Optionally, you can configure noise cancellation using LiveKit's plugins like
noise_cancellation.BVC()
. Here's an example of how to update your code:
Copy code
await session.start(
    room=ctx.room,
    agent=agent,
    room_input_options=RoomInputOptions(video_enabled=True, text_enabled=True, audio_enabled=True),
    room_output_options=RoomOutputOptions(transcription_enabled=True, audio_enabled=True)
)
If you need further assistance or have specific requirements for vision capabilities, let me know! Sources: Vision | LiveKit Docs | complex-agents/vision/agent.py | Web and mobile frontends | LiveKit Docs
Hey there! Enabling
video_enabled=True
in
RoomInputOptions
lets the agent receive video frames from the camera, which is a great start. But to actually have the agent respond to what it "sees" (like answering questions based on the camera input), you'll want to add a vision model, such as Gemini Live, into your setup. Just turning on video input isn’t quite enough for the agent to understand and react to visual info. You’ll need to connect it with a vision-capable model that can analyze those video frames and come up with responses based on the visuals. If you’d like, I’m here to help walk you through integrating a vision model—just give me a shout!
Hey there! Just a quick heads-up: the GPT-4.1-mini model you're using doesn’t support vision features for handling video input. If you want the agent to understand visual stuff, you’ll need to switch to a vision-capable model like Gemini Live or GPT-4-vision-preview. If you’d like a hand setting that up or want to explore some other options, just give me a shout—I’m here to help! Sources: Large language model (LLM) integrations | LiveKit Docs | tools/mtmd/legacy-models/minicpmv-convert-image-encoder-to-gguf.py | Vision | LiveKit Docs | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py | ChatModels | LiveKit Agents
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!