Does the following code tell the agent what to say...
# ask-ai
c
Does the following code tell the agent what to say ("Good Morning") or does it add "Good Morning" to the conversation history then ask the LLM for an initial response?
Copy code
const agent = new multimodal.MultimodalAgent({ model });
      const session = await agent
        .start(ctx.room, participant)
        .then((session) => session);
      session.conversation.item.create(llm.ChatMessage.create({
        role: llm.ChatRole.ASSISTANT,
        text: 'Good morning',
      }));
      session.response.create();
g
The code snippet adds "Good Morning" to the conversation history as an assistant message and then requests the LLM for an initial response. The
session.conversation.item.create
method is used to append the message to the conversation, and
session.response.create
triggers the LLM to generate a response based on the updated conversation context. Would you like more detail on how this process works? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729541882.418149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 My interpretation is that the result of this code is that the LLM conversation will end up containing two greetings. The one I added and the response it created. Is that correct?
g
It seems likely that your interpretation is correct. By adding "Good Morning" to the conversation history as an assistant message, the LLM will consider this when generating its response. Consequently, the conversation may contain two greetings: the one you added and the one generated by the LLM in response. Would you like more detail on how the conversation context affects the LLM's response? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1727795904.894299 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 Seems to me it's better to add the text to the conversation, but have the TTS directly speak the text. This solves two problems... the conversation history is accurate and I have precise control over how the conversation begins
g
It seems likely that your approach of adding the text to the conversation history while having the TTS directly speak the text can indeed solve the issues you mentioned. By doing this, you ensure that the conversation history remains accurate and you maintain precise control over the initial interaction. This method allows you to manage the conversation flow effectively without relying solely on the LLM to generate the initial greeting. Would you like more detail on implementing this approach with LiveKit? Sources: - VoicePipelineAgent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
@millions-hair-24956 Yes, how to do this with nodejs
g
To implement this approach in Node.js, you can use the LiveKit Agents framework to manage the conversation history and control the TTS output. Here's a basic outline of how you can achieve this: 1. Initialize the MultimodalAgent: Set up the agent with the desired model and context. 2. Add the Greeting to Conversation History: Use the
session.conversation.item.create
method to add the greeting message to the conversation history. 3. Directly Use TTS for Speaking: Instead of relying on the LLM to generate the initial response, use a TTS service to directly speak the greeting. Here's a simplified example:
Copy code
const { MultimodalAgent } = require('@livekit/agents');
const { ChatMessage, ChatRole } = require('@livekit/agents/llm');

async function startAgent(ctx, participant) {
  const agent = new MultimodalAgent({ model });
  const session = await agent.start(ctx.room, participant);

  // Add greeting to conversation history
  session.conversation.item.create(ChatMessage.create({
    role: ChatRole.ASSISTANT,
    text: 'Good morning',
  }));

  // Directly use TTS to speak the greeting
  await ttsService.speak('Good morning');
}

startAgent(ctx, participant);
In this example,
ttsService.speak
is a placeholder for whatever TTS service you are using to output the audio. You would replace this with the actual method call to your TTS service. Would you like more detail on integrating a specific TTS service? Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.