glamorous-byte-32596
02/15/2025, 3:28 PMgray-monkey-83525
02/15/2025, 3:28 PMpnpm install @livekit/agents @livekit/agents-plugin-openai
.
2. Create a MultimodalAgent: Implement the MultimodalAgent class to handle audio and text inputs. You can modify the agent to process video frames for vision tasks by subscribing to the user's video track and passing frames to a vision model.
3. Integrate Vision Model: Use a JavaScript-compatible vision model to analyze video frames. The model's output can be used to adjust the assistant's responses based on visual cues.
4. Develop Frontend: Build a frontend using frameworks like Next.js to handle user interactions and display video/audio streams.
For more detailed guidance, you can refer to the LiveKit MultimodalAgent documentation and explore examples in the LiveKit GitHub repository.
Would you like more detailed steps or code examples?
Sources:
- LiveKit MultimodalAgent documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.glamorous-byte-32596
02/15/2025, 3:31 PMgray-monkey-83525
02/15/2025, 3:31 PMRoom
object. This involves using the connect
method with the appropriate WebSocket URL and access token.
2. Listen for Track Events: Use the RoomEvent.TrackSubscribed
event to handle when a video track is subscribed. This event provides you with the RemoteTrack
, RemoteTrackPublication
, and RemoteParticipant
objects.
3. Attach the Video Track: Once you have the video track, you can attach it to an HTML video element for rendering.
Here's a basic example:
import { connect, RoomEvent } from 'livekit-client';
async function subscribeToVideoTrack(wsUrl, token) {
const room = new Room();
await room.connect(wsUrl, token);
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
if (track.kind === Track.Kind.Video) {
const videoElement = track.attach();
document.body.appendChild(videoElement);
}
});
}
This code connects to a LiveKit room and listens for video tracks being subscribed. When a video track is subscribed, it attaches the track to a new video element and appends it to the document body.
Would you like more detailed information or assistance with a specific part of the process?
Sources:
- LiveKit JS Client SDK 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.glamorous-byte-32596
02/15/2025, 3:33 PMgray-monkey-83525
02/15/2025, 3:33 PM@livekit/agents
.
2. Connect the Agent to the Room: Use the JobContext
to connect the agent to a LiveKit room. This involves setting up the agent to automatically subscribe to video tracks.
3. Handle Track Subscription: Implement a handler for the TrackSubscribed
event to process the video track. This can involve attaching the track to a video processing pipeline or a vision model.
Here's a basic example in Node.js:
const { JobContext, AutoSubscribe, cli } = require('@livekit/agents');
async function entrypoint(ctx) {
await ctx.connect({ auto_subscribe: AutoSubscribe.VIDEO_ONLY });
ctx.room.on('trackSubscribed', (track, publication, participant) => {
if (track.kind === 'video') {
console.log(`Subscribed to video track: ${track.sid}`);
// Process the video track here
}
});
}
cli.runApp({ entrypoint });
This code sets up an agent that connects to a room and subscribes to video tracks. When a video track is subscribed, it logs the track SID and can be extended to process the video data.
Would you like more detailed steps or assistance with a specific part of the implementation?
Sources:
- https://docs.livekit.io/guides/room/receive
- https://docs.livekit.io/guides/room/receive#radix-%3ARitkvfffep9tkq%3A-content-Swift 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.glamorous-byte-32596
02/15/2025, 3:35 PMgray-monkey-83525
02/15/2025, 3:35 PMtrack.kind
is of type TrackKind | undefined
, while you're comparing it to a string 'video'
. To resolve this, you should use the TrackKind
enum provided by LiveKit.
Here's how you can fix the comparison:
1. Import the TrackKind
Enum: Ensure you import the TrackKind
enum from the LiveKit SDK.
2. Use the Enum for Comparison: Compare track.kind
with TrackKind.Video
instead of the string 'video'
.
Here's an updated example:
import { TrackKind } from 'livekit-client';
ctx.room.on('trackSubscribed', (track, publication, participant) => {
if (track.kind === TrackKind.Video) {
console.log(`Subscribed to video track: ${track.sid}`);
// Process the video track here
}
});
This change ensures that the comparison is type-safe and aligns with TypeScript's type checking.
Would you like more detailed information or assistance with another part of your implementation?
Sources:
- https://docs.livekit.io/agents/build/tracks/
- https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730233417.503479 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.glamorous-byte-32596
02/15/2025, 3:43 PMgray-monkey-83525
02/15/2025, 3:43 PMbefore_llm_cb
callback in JavaScript with LiveKit's VoicePipelineAgent, you can define this callback to modify the chat context or perform actions before the LLM (Large Language Model) is invoked. Here's a simple example to illustrate how you might set this up:
1. Define the Callback: The before_llm_cb
is a function that takes the agent and chat context as parameters. You can use it to modify the chat context or decide whether to proceed with the LLM call.
2. Implement the Agent: Set up the VoicePipelineAgent with the before_llm_cb
callback.
Here's a basic example:
const { VoicePipelineAgent } = require('@livekit/agents');
function beforeLLMCallback(agent, chatCtx) {
// Example: Log the last message from the user
const lastMessage = chatCtx.messages[chatCtx.messages.length - 1];
console.log('Last user message:', lastMessage.content);
// Example: Modify the chat context or decide to proceed
if (lastMessage.content.includes('special keyword')) {
// Modify context or perform actions
return true; // Continue with LLM processing
}
return false; // Skip LLM processing
}
const agent = new VoicePipelineAgent({
beforeLLMCallback: beforeLLMCallback,
// Other agent configurations
});
// Start the agent in a room
agent.start(room, participant);
In this example, the beforeLLMCallback
function logs the last user message and checks for a specific keyword. If the keyword is present, it allows the LLM processing to continue; otherwise, it skips it.
Would you like more detailed steps or assistance with another part of your implementation?
Sources:
- https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args
- https://docs.livekit.io/agents/voice-agent/voice-pipeline/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.glamorous-byte-32596
02/15/2025, 3:46 PMgray-monkey-83525
02/15/2025, 3:46 PMbefore_llm_cb
in a VoicePipelineAgent is not directly available. The MultimodalAgent is designed to handle both audio and text inputs using a single primary model, and it does not have a built-in callback like before_llm_cb
for modifying the chat context before LLM processing.
However, you can achieve similar functionality by managing the chat context manually before invoking the model. You can modify the chat context or perform any necessary actions before sending the input to the model.
Here's a basic approach:
1. Modify Chat Context: Before you send a message to the MultimodalAgent, modify the chat context as needed.
2. Invoke the Model: Use the model's methods to process the input with the modified context.
Here's a conceptual example:
const { MultimodalAgent } = require('@livekit/agents');
async function processInput(agent, input) {
// Modify the chat context as needed
const chatCtx = agent.chatCtx;
chatCtx.messages.push({ role: 'user', content: input });
// Process the input with the model
const response = await agent.model.process(chatCtx);
console.log('Model response:', response);
}
// Initialize the MultimodalAgent
const agent = new MultimodalAgent({
model: yourModelInstance,
chatCtx: initialChatContext,
});
// Example usage
processInput(agent, 'Hello, how are you?');
In this example, you manually modify the chat context before processing the input with the model. This approach allows you to implement custom logic similar to before_llm_cb
.
Would you like more detailed steps or assistance with another part of your implementation?
Sources:
- LiveKit Slack Discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.