Does the following code tell the agent what to say Good Morn LiveKit Community #ask-ai

Does the following code tell the agent what to say...

curved-monitor-89515

02/25/2025, 6:25 PM

Does the following code tell the agent what to say ("Good Morning") or does it add "Good Morning" to the conversation history then ask the LLM for an initial response?

Copy code

const agent = new multimodal.MultimodalAgent({ model });
      const session = await agent
        .start(ctx.room, participant)
        .then((session) => session);
      session.conversation.item.create(llm.ChatMessage.create({
        role: llm.ChatRole.ASSISTANT,
        text: 'Good morning',
      }));
      session.response.create();

gray-monkey-83525

02/25/2025, 6:25 PM

The code snippet adds "Good Morning" to the conversation history as an assistant message and then requests the LLM for an initial response. The

session.conversation.item.create

method is used to append the message to the conversation, and

session.response.create

triggers the LLM to generate a response based on the updated conversation context. Would you like more detail on how this process works? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729541882.418149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

curved-monitor-89515

02/25/2025, 6:27 PM

@millions-hair-24956 My interpretation is that the result of this code is that the LLM conversation will end up containing two greetings. The one I added and the response it created. Is that correct?

gray-monkey-83525

02/25/2025, 6:27 PM

It seems likely that your interpretation is correct. By adding "Good Morning" to the conversation history as an assistant message, the LLM will consider this when generating its response. Consequently, the conversation may contain two greetings: the one you added and the one generated by the LLM in response. Would you like more detail on how the conversation context affects the LLM's response? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1727795904.894299 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

curved-monitor-89515

02/25/2025, 6:30 PM

@millions-hair-24956 Seems to me it's better to add the text to the conversation, but have the TTS directly speak the text. This solves two problems... the conversation history is accurate and I have precise control over how the conversation begins

gray-monkey-83525

02/25/2025, 6:30 PM

It seems likely that your approach of adding the text to the conversation history while having the TTS directly speak the text can indeed solve the issues you mentioned. By doing this, you ensure that the conversation history remains accurate and you maintain precise control over the initial interaction. This method allows you to manage the conversation flow effectively without relying solely on the LLM to generate the initial greeting. Would you like more detail on implementing this approach with LiveKit? Sources: - VoicePipelineAgent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

curved-monitor-89515

02/25/2025, 6:32 PM

@millions-hair-24956 Yes, how to do this with nodejs

gray-monkey-83525

02/25/2025, 6:32 PM

To implement this approach in Node.js, you can use the LiveKit Agents framework to manage the conversation history and control the TTS output. Here's a basic outline of how you can achieve this: 1. Initialize the MultimodalAgent: Set up the agent with the desired model and context. 2. Add the Greeting to Conversation History: Use the

session.conversation.item.create

method to add the greeting message to the conversation history. 3. Directly Use TTS for Speaking: Instead of relying on the LLM to generate the initial response, use a TTS service to directly speak the greeting. Here's a simplified example:

Copy code

const { MultimodalAgent } = require('@livekit/agents');
const { ChatMessage, ChatRole } = require('@livekit/agents/llm');

async function startAgent(ctx, participant) {
  const agent = new MultimodalAgent({ model });
  const session = await agent.start(ctx.room, participant);

  // Add greeting to conversation history
  session.conversation.item.create(ChatMessage.create({
    role: ChatRole.ASSISTANT,
    text: 'Good morning',
  }));

  // Directly use TTS to speak the greeting
  await ttsService.speak('Good morning');
}

startAgent(ctx, participant);

In this example,

ttsService.speak

is a placeholder for whatever TTS service you are using to output the audio. You would replace this with the actual method call to your TTS service. Would you like more detail on integrating a specific TTS service? Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next