Hi there! Which is the best way of evaluating agen...
# ask-ai
b
Hi there! Which is the best way of evaluating agent responses? I am working on a project in which I am building an agent using LiveKit, and I'm looking for the best way to evaluate its answers. I mean programmatically asking a bunch of questions from my evaluation dataset, obtaining the responses, and doing the evaluation of those responses (all of this outside LiveKit ofc). As of today, I am working this out by running the LiveKit server plus an agent worker connected to it, and using the
livekit.rtc
module (along with
livekit.api
) to connect to the server, create a Room, sending messages to that room and listening for answers. The question is: is there any way to do this in a simpler way? It feels like it should exist, since the approach I'm currently following involves kind of unnecessary dealing with complicated communication protocols, and room management & stuff, when the only thing I want to do is to test whether my agent is working well or not, if its calling the tools it should call depending on the inputs, etc.
t
Thanks for reaching out, Albert. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Thank you for reaching out! Based on the information available, you can evaluate your agent's responses programmatically by creating a specialized evaluation agent. This evaluation agent can ask questions, grade responses, and interact with your main agent. Here's a simplified approach: 1. Create an evaluation agent with specific instructions for testing and grading. For example:
Copy code
class SimpleEvaluationAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="""
                You are evaluating the performance of a user.
                [evaluation instructions here]
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=openai.TTS(),
            vad=silero.VAD.load()
        )
2. Implement a grading function to assess the responses programmatically: `@function_tool async def grade_answer(self, context: RunContext, result: str, question: str): """ Give a
result
of
PASS
or
FAIL
for each
question
""" self.session.say(f"The grade for the question {question} is {result}") return None, "I've graded the answer."` 3. Configure the evaluation agent to interact with your main agent by updating the `RoomInputOptions`:
Copy code
await session.start(
    agent=SimpleEvaluationAgent(),
    room=ctx.room,
    room_input_options=RoomInputOptions(
        participant_kinds=[
            rtc.ParticipantKind.PARTICIPANT_KIND_AGENT,
        ]
    ),
)
This method simplifies the process by automating the testing and evaluation of your agent's responses without the need for complex communication protocols or room management. Let me know if you need further assistance! Sources: How to test your agent using another agent