<@U084APK40NS> how to build an AI-powered assistan...
# ask-ai
g
@millions-hair-24956 how to build an AI-powered assistant that engages in realtime conversations with access to vision in Javascript / node NOT python
g
To build an AI-powered assistant that engages in real-time conversations with vision capabilities using JavaScript/Node.js, you can utilize LiveKit's MultimodalAgent class. This class allows for the integration of audio and text inputs, and you can extend it to include vision by processing video frames. Here's a high-level approach: 1. Set Up LiveKit: Use LiveKit's Node.js SDK to create a real-time communication environment. Install the necessary packages using
pnpm install @livekit/agents @livekit/agents-plugin-openai
. 2. Create a MultimodalAgent: Implement the MultimodalAgent class to handle audio and text inputs. You can modify the agent to process video frames for vision tasks by subscribing to the user's video track and passing frames to a vision model. 3. Integrate Vision Model: Use a JavaScript-compatible vision model to analyze video frames. The model's output can be used to adjust the assistant's responses based on visual cues. 4. Develop Frontend: Build a frontend using frameworks like Next.js to handle user interactions and display video/audio streams. For more detailed guidance, you can refer to the LiveKit MultimodalAgent documentation and explore examples in the LiveKit GitHub repository. Would you like more detailed steps or code examples? Sources: - LiveKit MultimodalAgent documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
g
how to subscribe to the user videotrack @millions-hair-24956
g
To subscribe to a user's video track in LiveKit using JavaScript, you can follow these steps: 1. Connect to a Room: First, ensure you are connected to a LiveKit room using the
Room
object. This involves using the
connect
method with the appropriate WebSocket URL and access token. 2. Listen for Track Events: Use the
RoomEvent.TrackSubscribed
event to handle when a video track is subscribed. This event provides you with the
RemoteTrack
,
RemoteTrackPublication
, and
RemoteParticipant
objects. 3. Attach the Video Track: Once you have the video track, you can attach it to an HTML video element for rendering. Here's a basic example:
Copy code
import { connect, RoomEvent } from 'livekit-client';

async function subscribeToVideoTrack(wsUrl, token) {
  const room = new Room();
  await room.connect(wsUrl, token);

  room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
    if (track.kind === Track.Kind.Video) {
      const videoElement = track.attach();
      document.body.appendChild(videoElement);
    }
  });
}
This code connects to a LiveKit room and listens for video tracks being subscribed. When a video track is subscribed, it attaches the track to a new video element and appends it to the document body. Would you like more detailed information or assistance with a specific part of the process? Sources: - LiveKit JS Client SDK 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
g
@millions-hair-24956 its actually an AI agent that should subscribe to the videotrack of the user
g
To have an AI agent subscribe to a user's video track in LiveKit using Node.js, you can utilize the LiveKit Agents framework. Here's a concise guide on how to achieve this: 1. Set Up the Agent: Use the LiveKit Agents framework to create an agent that can join a room and subscribe to tracks. Ensure you have the necessary packages installed, such as
@livekit/agents
. 2. Connect the Agent to the Room: Use the
JobContext
to connect the agent to a LiveKit room. This involves setting up the agent to automatically subscribe to video tracks. 3. Handle Track Subscription: Implement a handler for the
TrackSubscribed
event to process the video track. This can involve attaching the track to a video processing pipeline or a vision model. Here's a basic example in Node.js:
Copy code
const { JobContext, AutoSubscribe, cli } = require('@livekit/agents');

async function entrypoint(ctx) {
  await ctx.connect({ auto_subscribe: AutoSubscribe.VIDEO_ONLY });

  ctx.room.on('trackSubscribed', (track, publication, participant) => {
    if (track.kind === 'video') {
      console.log(`Subscribed to video track: ${track.sid}`);
      // Process the video track here
    }
  });
}

cli.runApp({ entrypoint });
This code sets up an agent that connects to a room and subscribes to video tracks. When a video track is subscribed, it logs the track SID and can be extended to process the video data. Would you like more detailed steps or assistance with a specific part of the implementation? Sources: - https://docs.livekit.io/guides/room/receive - https://docs.livekit.io/guides/room/receive#radix-%3ARitkvfffep9tkq%3A-content-Swift 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
g
@millions-hair-24956 if (track.kind === 'video') { This comparison appears to be unintentional because the types 'TrackKind | undefined' and 'string' have no overlap.ts(2367) View Problem (⌘K N)
g
The error you're encountering is due to a type mismatch in TypeScript. The
track.kind
is of type
TrackKind | undefined
, while you're comparing it to a string
'video'
. To resolve this, you should use the
TrackKind
enum provided by LiveKit. Here's how you can fix the comparison: 1. Import the
TrackKind
Enum
: Ensure you import the
TrackKind
enum from the LiveKit SDK. 2. Use the Enum for Comparison: Compare
track.kind
with
TrackKind.Video
instead of the string
'video'
. Here's an updated example:
Copy code
import { TrackKind } from 'livekit-client';

ctx.room.on('trackSubscribed', (track, publication, participant) => {
  if (track.kind === TrackKind.Video) {
    console.log(`Subscribed to video track: ${track.sid}`);
    // Process the video track here
  }
});
This change ensures that the comparison is type-safe and aligns with TypeScript's type checking. Would you like more detailed information or assistance with another part of your implementation? Sources: - https://docs.livekit.io/agents/build/tracks/ - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730233417.503479 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
g
@millions-hair-24956 how can I use before_llm_cb in javascript? Provide me a dummy example
g
To use the
before_llm_cb
callback in JavaScript with LiveKit's VoicePipelineAgent, you can define this callback to modify the chat context or perform actions before the LLM (Large Language Model) is invoked. Here's a simple example to illustrate how you might set this up: 1. Define the Callback: The
before_llm_cb
is a function that takes the agent and chat context as parameters. You can use it to modify the chat context or decide whether to proceed with the LLM call. 2. Implement the Agent: Set up the VoicePipelineAgent with the
before_llm_cb
callback. Here's a basic example:
Copy code
const { VoicePipelineAgent } = require('@livekit/agents');

function beforeLLMCallback(agent, chatCtx) {
  // Example: Log the last message from the user
  const lastMessage = chatCtx.messages[chatCtx.messages.length - 1];
  console.log('Last user message:', lastMessage.content);

  // Example: Modify the chat context or decide to proceed
  if (lastMessage.content.includes('special keyword')) {
    // Modify context or perform actions
    return true; // Continue with LLM processing
  }
  return false; // Skip LLM processing
}

const agent = new VoicePipelineAgent({
  beforeLLMCallback: beforeLLMCallback,
  // Other agent configurations
});

// Start the agent in a room
agent.start(room, participant);
In this example, the
beforeLLMCallback
function logs the last user message and checks for a specific keyword. If the keyword is present, it allows the LLM processing to continue; otherwise, it skips it. Would you like more detailed steps or assistance with another part of your implementation? Sources: - https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args - https://docs.livekit.io/agents/voice-agent/voice-pipeline/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
g
if i am using a Multimodal agent, what is the equivalent? @millions-hair-24956?
g
When using a MultimodalAgent in LiveKit, the equivalent of
before_llm_cb
in a VoicePipelineAgent is not directly available. The MultimodalAgent is designed to handle both audio and text inputs using a single primary model, and it does not have a built-in callback like
before_llm_cb
for modifying the chat context before LLM processing. However, you can achieve similar functionality by managing the chat context manually before invoking the model. You can modify the chat context or perform any necessary actions before sending the input to the model. Here's a basic approach: 1. Modify Chat Context: Before you send a message to the MultimodalAgent, modify the chat context as needed. 2. Invoke the Model: Use the model's methods to process the input with the modified context. Here's a conceptual example:
Copy code
const { MultimodalAgent } = require('@livekit/agents');

async function processInput(agent, input) {
  // Modify the chat context as needed
  const chatCtx = agent.chatCtx;
  chatCtx.messages.push({ role: 'user', content: input });

  // Process the input with the model
  const response = await agent.model.process(chatCtx);
  console.log('Model response:', response);
}

// Initialize the MultimodalAgent
const agent = new MultimodalAgent({
  model: yourModelInstance,
  chatCtx: initialChatContext,
});

// Example usage
processInput(agent, 'Hello, how are you?');
In this example, you manually modify the chat context before processing the input with the model. This approach allows you to implement custom logic similar to
before_llm_cb
. Would you like more detailed steps or assistance with another part of your implementation? Sources: - LiveKit Slack Discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.