< millions hair 24956> I want to build an agent that uses op LiveKit Community #ask-ai

<@U084APK40NS> I want to build an agent that uses ...

refined-appointment-81829

02/28/2025, 12:13 AM

@millions-hair-24956 I want to build an agent that uses openai it should support voice and text input. User should be able to ask for the weather of a given city and the agent should resond for the weather at that location. If city is not give the agent should ask for the city before returning the weather.

gray-monkey-83525

02/28/2025, 12:13 AM

To build an agent that supports both voice and text input using OpenAI, you can utilize the

MultimodalAgent

class from LiveKit. This class allows you to process both audio and text inputs and generate audio responses. You can implement function calling to fetch weather data for a given city. If the city is not provided, the agent can prompt the user to specify the city. Here's a basic outline of how you can achieve this: 1. Set Up the MultimodalAgent: Use the

MultimodalAgent

class to handle both voice and text inputs. This agent can manage conversation states and handle interruptions. 2. Implement Function Calling: Define a function to fetch weather data. Use the

ai_callable

decorator to make this function available to the LLM. If the user does not specify a city, the agent should prompt for it. 3. Handle User Input: Use the

on_user_speech_committed

event to process user input and determine if a city was mentioned. If not, prompt the user to provide one. 4. Respond to User: Once the city is identified, call the weather function and respond with the weather information. Here's a simplified code snippet to illustrate the setup:

Copy code

from livekit.agents import llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
import aiohttp

class WeatherAgent(llm.FunctionContext):
    @llm.ai_callable(description="Get the weather for a given city.")
    async def get_weather(self, location: str):
        url = f"https://wttr.in/{location}?format=%C+%t"
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                if response.status == 200:
                    weather_data = await response.text()
                    return f"The weather in {location} is {weather_data}."
                else:
                    return "Failed to retrieve weather data."

async def entrypoint(ctx):
    model = openai.realtime.RealtimeModel(instructions="You are a helpful assistant.")
    agent = MultimodalAgent(model=model, fnc_ctx=WeatherAgent())
    await ctx.connect()
    participant = await ctx.wait_for_participant()
    agent.start(ctx.room, participant)

    @agent.on("user_speech_committed")
    async def on_user_speech_committed(msg):
        if "weather" in msg.content:
            if "city" not in msg.content:
                await agent.say("Please specify the city.")
            else:
                # Extract city and call get_weather
                city = extract_city_from_message(msg.content)
                weather_info = await agent.fnc_ctx.get_weather(city)
                await agent.say(weather_info)

def extract_city_from_message(message):
    # Implement logic to extract city from message
    return "New York"  # Example city

gray-monkey-83525

02/28/2025, 12:13 AM

Would you like more detailed guidance on any specific part of this setup? Sources: - Function calling with Voice Agents - Working with the MultimodalAgent class 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next