can we send files to a llm LiveKit Community #ask-ai

Join Slack

can we send files to a llm?

# ask-ai

calm-book-72252

07/12/2025, 5:44 AM

can we send files to a llm?

tall-airline-16756

07/12/2025, 5:44 AM

Thanks for reaching out, Aditya. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/12/2025, 5:46 AM

LiveKit supports sending files to LLMs (Large Language Models) through several mechanisms, but the approach depends on the integration details and the capabilities of your chosen LLM provider: • The LiveKit Agents framework offers an interface for interacting with various LLMs, and some providers support advanced inputs such as vision or structured data. • LiveKit enables transferring files, images, or other binary data via byte streams, which can be incorporated into your workflow depending on your LLM’s supported features. To send files to an LLM, ensure your provider supports file or vision-based input. Review the provider’s documentation for specific implementation requirements, as LiveKit enables the transport, but interpretation and processing depend on the LLM’s APIs and capabilities. If you can share which LLM provider you’re looking to integrate with or more details about your use case, I can provide more tailored guidance. Sources: LLM Integrations | livekit/livekit_composite | DeepWiki | Large language model (LLM) integrations | LiveKit Docs | Realtime data | LiveKit Docs

calm-book-72252

07/12/2025, 5:50 AM

okay let's suppose we have to send a file at the start of the conversation when the voice agent joins the room

tall-airline-16756

07/12/2025, 5:51 AM

Hey there! Want to send a file to an LLM when a voice agent joins a LiveKit room? No worries, I've got you covered with a super straightforward method. Here's how we can make this happen, step by simple step: • Initial context loading: Think of this like introducing your file (maybe an image?) to the agent right from the start. You can do this using a base64 data URL or just sharing a link. • Entrypoint implementation: When your agent session kicks off, you'll want to add your file(s) to the

ChatContext

using

add_message

, and then get the session rolling with this context.

Copy code

async def entrypoint(ctx: JobContext):
    initial_ctx = ChatContext()
    initial_ctx.add_message(
        role="user",
        content=[
            "Here is a picture of me",
            ImageContent(image="<https://example.com/image.jpg>")
        ],
    )
    await session.start(
        room=ctx.room,
        agent=Agent(chat_ctx=initial_ctx),
    )

Pro tip: If you've got a local file, you can totally convert it to a base64 data URL instead of using a web link. • Provider compatibility: Just a friendly reminder to double-check that your LLM provider is cool with the type of file you're sending (like images). Oh, and if you're expecting files to pop up after the agent joins, no stress! You can also set up a byte stream handler to process files as they come into the room. Sources: Vision | LiveKit Docs | examples/voice_agents/multi_agent.py | examples/data-streams/index.ts

calm-book-72252

07/12/2025, 5:52 AM

what about a pdf

tall-airline-16756

07/12/2025, 5:53 AM

Hey there! Let me walk you through how you can send a PDF to an LLM when a voice agent joins a LiveKit room - it's actually pretty straightforward. • Direct file transfer: You can use LiveKit's

sendFile

method right in the frontend to upload your PDF when the session starts. On the agent side, they'll just need to set up a byte stream handler to receive and process the PDF data. • PDF processing: Once you've got the PDF, you've got a couple of cool options: ◦ Convert to text (which works great for most LLMs): Just use a PDF parsing library like PDF.js to extract the text, and then add that text to the LLM's context. ◦ Convert to images (perfect for vision-capable LLMs): Simply convert each PDF page to an image (like PNG), and then add those images as

ImageContent

to the LLM's context. Pro tip: Text conversion usually works best for standard LLMs, while vision LLMs are awesome at processing images. Just keep in mind there's a 500MB file size limit for uploads, and it's always smart to double-check your LLM provider's documentation for their specific input requirements. Need a specific code example? Just let me know what workflow or LLM provider you're working with, and I'll be happy to help! Sources: Sending files & bytes | LiveKit Docs | Vision | LiveKit Docs | tools/server/webui/src/components/useChatExtraContext.tsx | livekit-agents/livekit/agents/llm/chat_context.py

Open in Slack

Previous Next