<@U084APK40NS> I want to build an agent that uses ...
# ask-ai
r
@millions-hair-24956 I want to build an agent that uses openai it should support voice and text input. User should be able to ask for the weather of a given city and the agent should resond for the weather at that location. If city is not give the agent should ask for the city before returning the weather.
g
To build an agent that supports both voice and text input using OpenAI, you can utilize the
MultimodalAgent
class from LiveKit. This class allows you to process both audio and text inputs and generate audio responses. You can implement function calling to fetch weather data for a given city. If the city is not provided, the agent can prompt the user to specify the city. Here's a basic outline of how you can achieve this: 1. Set Up the MultimodalAgent: Use the
MultimodalAgent
class to handle both voice and text inputs. This agent can manage conversation states and handle interruptions. 2. Implement Function Calling: Define a function to fetch weather data. Use the
ai_callable
decorator to make this function available to the LLM. If the user does not specify a city, the agent should prompt for it. 3. Handle User Input: Use the
on_user_speech_committed
event to process user input and determine if a city was mentioned. If not, prompt the user to provide one. 4. Respond to User: Once the city is identified, call the weather function and respond with the weather information. Here's a simplified code snippet to illustrate the setup:
Copy code
from livekit.agents import llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
import aiohttp

class WeatherAgent(llm.FunctionContext):
    @llm.ai_callable(description="Get the weather for a given city.")
    async def get_weather(self, location: str):
        url = f"https://wttr.in/{location}?format=%C+%t"
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                if response.status == 200:
                    weather_data = await response.text()
                    return f"The weather in {location} is {weather_data}."
                else:
                    return "Failed to retrieve weather data."

async def entrypoint(ctx):
    model = openai.realtime.RealtimeModel(instructions="You are a helpful assistant.")
    agent = MultimodalAgent(model=model, fnc_ctx=WeatherAgent())
    await ctx.connect()
    participant = await ctx.wait_for_participant()
    agent.start(ctx.room, participant)

    @agent.on("user_speech_committed")
    async def on_user_speech_committed(msg):
        if "weather" in msg.content:
            if "city" not in msg.content:
                await agent.say("Please specify the city.")
            else:
                # Extract city and call get_weather
                city = extract_city_from_message(msg.content)
                weather_info = await agent.fnc_ctx.get_weather(city)
                await agent.say(weather_info)

def extract_city_from_message(message):
    # Implement logic to extract city from message
    return "New York"  # Example city
Would you like more detailed guidance on any specific part of this setup? Sources: - Function calling with Voice Agents - Working with the MultimodalAgent class 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.