< millions hair 24956> how to build an AI powered assistant LiveKit Community #ask-ai

<@U084APK40NS> how to build an AI-powered assistan...

glamorous-byte-32596

02/15/2025, 3:28 PM

@millions-hair-24956 how to build an AI-powered assistant that engages in realtime conversations with access to vision in Javascript / node NOT python

gray-monkey-83525

02/15/2025, 3:28 PM

To build an AI-powered assistant that engages in real-time conversations with vision capabilities using JavaScript/Node.js, you can utilize LiveKit's MultimodalAgent class. This class allows for the integration of audio and text inputs, and you can extend it to include vision by processing video frames. Here's a high-level approach: 1. Set Up LiveKit: Use LiveKit's Node.js SDK to create a real-time communication environment. Install the necessary packages using

pnpm install @livekit/agents @livekit/agents-plugin-openai

. 2. Create a MultimodalAgent: Implement the MultimodalAgent class to handle audio and text inputs. You can modify the agent to process video frames for vision tasks by subscribing to the user's video track and passing frames to a vision model. 3. Integrate Vision Model: Use a JavaScript-compatible vision model to analyze video frames. The model's output can be used to adjust the assistant's responses based on visual cues. 4. Develop Frontend: Build a frontend using frameworks like Next.js to handle user interactions and display video/audio streams. For more detailed guidance, you can refer to the LiveKit MultimodalAgent documentation and explore examples in the LiveKit GitHub repository. Would you like more detailed steps or code examples? Sources: - LiveKit MultimodalAgent documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

glamorous-byte-32596

02/15/2025, 3:31 PM

how to subscribe to the user videotrack @millions-hair-24956

gray-monkey-83525

02/15/2025, 3:31 PM

To subscribe to a user's video track in LiveKit using JavaScript, you can follow these steps: 1. Connect to a Room: First, ensure you are connected to a LiveKit room using the

Room

object. This involves using the

connect

method with the appropriate WebSocket URL and access token. 2. Listen for Track Events: Use the

RoomEvent.TrackSubscribed

event to handle when a video track is subscribed. This event provides you with the

RemoteTrack

RemoteTrackPublication

, and

RemoteParticipant

objects. 3. Attach the Video Track: Once you have the video track, you can attach it to an HTML video element for rendering. Here's a basic example:

Copy code

import { connect, RoomEvent } from 'livekit-client';

async function subscribeToVideoTrack(wsUrl, token) {
  const room = new Room();
  await room.connect(wsUrl, token);

  room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
    if (track.kind === Track.Kind.Video) {
      const videoElement = track.attach();
      document.body.appendChild(videoElement);
    }
  });
}

This code connects to a LiveKit room and listens for video tracks being subscribed. When a video track is subscribed, it attaches the track to a new video element and appends it to the document body. Would you like more detailed information or assistance with a specific part of the process? Sources: - LiveKit JS Client SDK 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

glamorous-byte-32596

02/15/2025, 3:33 PM

@millions-hair-24956 its actually an AI agent that should subscribe to the videotrack of the user

gray-monkey-83525

02/15/2025, 3:33 PM

To have an AI agent subscribe to a user's video track in LiveKit using Node.js, you can utilize the LiveKit Agents framework. Here's a concise guide on how to achieve this: 1. Set Up the Agent: Use the LiveKit Agents framework to create an agent that can join a room and subscribe to tracks. Ensure you have the necessary packages installed, such as

@livekit/agents

. 2. Connect the Agent to the Room: Use the

JobContext

to connect the agent to a LiveKit room. This involves setting up the agent to automatically subscribe to video tracks. 3. Handle Track Subscription: Implement a handler for the

TrackSubscribed

event to process the video track. This can involve attaching the track to a video processing pipeline or a vision model. Here's a basic example in Node.js:

Copy code

const { JobContext, AutoSubscribe, cli } = require('@livekit/agents');

async function entrypoint(ctx) {
  await ctx.connect({ auto_subscribe: AutoSubscribe.VIDEO_ONLY });

  ctx.room.on('trackSubscribed', (track, publication, participant) => {
    if (track.kind === 'video') {
      console.log(`Subscribed to video track: ${track.sid}`);
      // Process the video track here
    }
  });
}

cli.runApp({ entrypoint });

This code sets up an agent that connects to a room and subscribes to video tracks. When a video track is subscribed, it logs the track SID and can be extended to process the video data. Would you like more detailed steps or assistance with a specific part of the implementation? Sources: - https://docs.livekit.io/guides/room/receive - https://docs.livekit.io/guides/room/receive#radix-%3ARitkvfffep9tkq%3A-content-Swift 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

glamorous-byte-32596

02/15/2025, 3:35 PM

@millions-hair-24956 if (track.kind === 'video') { This comparison appears to be unintentional because the types 'TrackKind | undefined' and 'string' have no overlap.ts(2367) View Problem (⌘K N)

gray-monkey-83525

02/15/2025, 3:35 PM

The error you're encountering is due to a type mismatch in TypeScript. The

track.kind

is of type

TrackKind | undefined

, while you're comparing it to a string

'video'

. To resolve this, you should use the

TrackKind

enum provided by LiveKit. Here's how you can fix the comparison: 1. Import the
TrackKind
Enum: Ensure you import the

TrackKind

enum from the LiveKit SDK. 2. Use the Enum for Comparison: Compare

track.kind

with

TrackKind.Video

instead of the string

'video'

. Here's an updated example:

Copy code

import { TrackKind } from 'livekit-client';

ctx.room.on('trackSubscribed', (track, publication, participant) => {
  if (track.kind === TrackKind.Video) {
    console.log(`Subscribed to video track: ${track.sid}`);
    // Process the video track here
  }
});

This change ensures that the comparison is type-safe and aligns with TypeScript's type checking. Would you like more detailed information or assistance with another part of your implementation? Sources: - https://docs.livekit.io/agents/build/tracks/ - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730233417.503479 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

glamorous-byte-32596

02/15/2025, 3:43 PM

@millions-hair-24956 how can I use before_llm_cb in javascript? Provide me a dummy example

gray-monkey-83525

02/15/2025, 3:43 PM

To use the

before_llm_cb

callback in JavaScript with LiveKit's VoicePipelineAgent, you can define this callback to modify the chat context or perform actions before the LLM (Large Language Model) is invoked. Here's a simple example to illustrate how you might set this up: 1. Define the Callback: The

before_llm_cb

is a function that takes the agent and chat context as parameters. You can use it to modify the chat context or decide whether to proceed with the LLM call. 2. Implement the Agent: Set up the VoicePipelineAgent with the

before_llm_cb

callback. Here's a basic example:

Copy code

const { VoicePipelineAgent } = require('@livekit/agents');

function beforeLLMCallback(agent, chatCtx) {
  // Example: Log the last message from the user
  const lastMessage = chatCtx.messages[chatCtx.messages.length - 1];
  console.log('Last user message:', lastMessage.content);

  // Example: Modify the chat context or decide to proceed
  if (lastMessage.content.includes('special keyword')) {
    // Modify context or perform actions
    return true; // Continue with LLM processing
  }
  return false; // Skip LLM processing
}

const agent = new VoicePipelineAgent({
  beforeLLMCallback: beforeLLMCallback,
  // Other agent configurations
});

// Start the agent in a room
agent.start(room, participant);

In this example, the

beforeLLMCallback

function logs the last user message and checks for a specific keyword. If the keyword is present, it allows the LLM processing to continue; otherwise, it skips it. Would you like more detailed steps or assistance with another part of your implementation? Sources: - https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args - https://docs.livekit.io/agents/voice-agent/voice-pipeline/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

glamorous-byte-32596

02/15/2025, 3:46 PM

if i am using a Multimodal agent, what is the equivalent? @millions-hair-24956?

gray-monkey-83525

02/15/2025, 3:46 PM

When using a MultimodalAgent in LiveKit, the equivalent of

before_llm_cb

in a VoicePipelineAgent is not directly available. The MultimodalAgent is designed to handle both audio and text inputs using a single primary model, and it does not have a built-in callback like

before_llm_cb

for modifying the chat context before LLM processing. However, you can achieve similar functionality by managing the chat context manually before invoking the model. You can modify the chat context or perform any necessary actions before sending the input to the model. Here's a basic approach: 1. Modify Chat Context: Before you send a message to the MultimodalAgent, modify the chat context as needed. 2. Invoke the Model: Use the model's methods to process the input with the modified context. Here's a conceptual example:

Copy code

const { MultimodalAgent } = require('@livekit/agents');

async function processInput(agent, input) {
  // Modify the chat context as needed
  const chatCtx = agent.chatCtx;
  chatCtx.messages.push({ role: 'user', content: input });

  // Process the input with the model
  const response = await agent.model.process(chatCtx);
  console.log('Model response:', response);
}

// Initialize the MultimodalAgent
const agent = new MultimodalAgent({
  model: yourModelInstance,
  chatCtx: initialChatContext,
});

// Example usage
processInput(agent, 'Hello, how are you?');

In this example, you manually modify the chat context before processing the input with the model. This approach allows you to implement custom logic similar to

before_llm_cb

. Would you like more detailed steps or assistance with another part of your implementation? Sources: - LiveKit Slack Discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next