For the testing and evaluation, is there a way to ...
# ask-ai
a
For the testing and evaluation, is there a way to have an OR assertion ? https://docs.livekit.io/agents/build/testing/
t
Thanks for reaching out, Cesar. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
LiveKit does not have a direct OR assertion operator in its testing and evaluation framework. However, you can achieve similar functionality using the following approaches: • Use
contains
Methods:
Combine methods like
contains_message()
and
contains_function_call()
to check for multiple possible events. • Indexed Access: Use event indexing to assert different conditions at specific positions. • Leverage `judge()`: Use the
judge()
method with an LLM to evaluate flexible conditions, including OR logic. • Chain Assertions: Write multiple assertions to cover different scenarios. For example, you can test for either a message greeting or a function call by chaining assertions or using the
judge()
method to describe the intent. If you need help implementing any of these approaches, let me know!
a
i wanna chain assertions, could you give me an exmaple to test for either a message or a function call?
t
Here’s a handy example showing how you can chain assertions to test for either a message or a function call:
Copy code
# Test for either a message or a function call
@pytest.mark.asyncio
async def test_agent_response():
    async with (
        openai.LLM(model="gpt-4o-mini") as llm,
        AgentSession(llm=llm) as session,
    ):
        await session.start(Assistant())
        result = await session.run(user_input="Hello")

        try:
            # First scenario: Check for a message
            result.expect.next_event().is_message(role="assistant")
        except AssertionError:
            try:
                # Second scenario: Check for a function call
                result.expect.next_event().is_function_call(name="greet")
            except AssertionError:
                # If neither scenario matches, fail the test
                raise AssertionError("Neither a message nor a function call was matched.")
This example first tries to see if the next event is a message from the assistant. If that doesn’t pan out, it checks for a function call named "greet." If neither fits, the test will let you know by failing. Feel free to ask if you want me to walk you through it more or if you'd like some extra examples!
a
doesnt that advances the cursor on the second scenario?
t
Hey there! So, when you use
next_event()
in chained assertions, it actually moves the cursor forward to the next event. That means if your first scenario doesn’t pass and you move on to the second one, the cursor might have already skipped the event you wanted to check. To get around this, you can use indexed access instead of
next_event()
. This way, you can directly point to specific events without moving the cursor. Here’s a little example:
Copy code
# Test for either a message or a function call without advancing the cursor
@pytest.mark.asyncio
async def test_agent_response():
    async with (
        openai.LLM(model="gpt-4o-mini") as llm,
        AgentSession(llm=llm) as session,
    ):
        await session.start(Assistant())
        result = await session.run(user_input="Hello")

        # Check the first event for either a message or a function call
        first_event = result.expect[0]
        try:
            first_event.is_message(role="assistant")
        except AssertionError:
            first_event.is_function_call(name="greet")
This way, you’re making sure you’re checking the same event for both conditions, without accidentally moving past it. If you want me to walk you through it more or need some extra examples, just let me know—I’m happy to help!
a
is there a way i can have tests run 10 times each and get statisics of how many times out of 10 each test failed?
t
Absolutely! You can run LiveKit tests as many times as you like and track the failure rates using the
tool_bench.py
script. It has a handy
-n
option where you can decide how many times each test runs (it defaults to 10). The script keeps track of both successes and failures, so you can easily see the failure rates across all those runs. If you want, I’m here to help you get it set up or explain anything else about using the script! Sources: tools/llama-bench/README.md
a
can we pass seeds to LLM ?
t
Absolutely! You can pass seeds to LLMs in LiveKit to get deterministic behavior. This is actually shown in the
test_completion.py
tests, where using the same seed helps ensure you get consistent results every time with the same inputs and settings. Plus, the Google Gemini LLM integration also supports a seed parameter to keep things predictable. If you'd like, I’m here to help you set this up—just let me know! Sources: tools/server/tests/unit/test_completion.py | livekit-plugins/livekit-plugins-anthropic/README.md