Hi there Which is the best way of evaluating agent responses LiveKit Community #ask-ai

Hi there! Which is the best way of evaluating agen...

bulky-machine-37433

06/04/2025, 1:30 PM

Hi there! Which is the best way of evaluating agent responses? I am working on a project in which I am building an agent using LiveKit, and I'm looking for the best way to evaluate its answers. I mean programmatically asking a bunch of questions from my evaluation dataset, obtaining the responses, and doing the evaluation of those responses (all of this outside LiveKit ofc). As of today, I am working this out by running the LiveKit server plus an agent worker connected to it, and using the

livekit.rtc

module (along with

livekit.api

) to connect to the server, create a Room, sending messages to that room and listening for answers. The question is: is there any way to do this in a simpler way? It feels like it should exist, since the approach I'm currently following involves kind of unnecessary dealing with complicated communication protocols, and room management & stuff, when the only thing I want to do is to test whether my agent is working well or not, if its calling the tools it should call depending on the inputs, etc.

tall-airline-16756

06/04/2025, 1:30 PM

Thanks for reaching out, Albert. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/04/2025, 1:30 PM

Thank you for reaching out! Based on the information available, you can evaluate your agent's responses programmatically by creating a specialized evaluation agent. This evaluation agent can ask questions, grade responses, and interact with your main agent. Here's a simplified approach: 1. Create an evaluation agent with specific instructions for testing and grading. For example:

Copy code

class SimpleEvaluationAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="""
                You are evaluating the performance of a user.
                [evaluation instructions here]
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=openai.TTS(),
            vad=silero.VAD.load()
        )

2. Implement a grading function to assess the responses programmatically: `@function_tool async def grade_answer(self, context: RunContext, result: str, question: str): """ Give a

result

PASS

FAIL

for each

question

""" self.session.say(f"The grade for the question {question} is {result}") return None, "I've graded the answer."` 3. Configure the evaluation agent to interact with your main agent by updating the `RoomInputOptions`:

Copy code

await session.start(
    agent=SimpleEvaluationAgent(),
    room=ctx.room,
    room_input_options=RoomInputOptions(
        participant_kinds=[
            rtc.ParticipantKind.PARTICIPANT_KIND_AGENT,
        ]
    ),
)

This method simplifies the process by automating the testing and evaluation of your agent's responses without the need for complex communication protocols or room management. Let me know if you need further assistance! Sources: How to test your agent using another agent

Open in Slack

Previous Next