https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • Asserting on complex nested extracted entities
    d

    Donato Azevedo

    06/05/2025, 1:52 PM
    Hi everyone! I'm humbly looking for suggestions and opinions on how to assert on complex, sometimes deeply nested, extracted entities. My task extracts several entities from pdf documents and I am already using promptfoo to assert and build up metrics for performance. But it's getting ever so hard because of the complexity of the extracted entities. For example, this is one of my assertions:
    Copy code
    - type: python
          value: ('Outorgar Poderes' in output['pessoas_analisadas'][1]['restricoes_valor'] and '12' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'] and 'meses' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'])
          metric: alcada
    And this is not even robust, becase it depends on the order of the
    output['pessoas_analisadas']
    list being consistent across different evals. I'd appreciate any sugestion. Meanwhile, I was even considering contributing a
    transform
    property to assert-sets, which would enable this kind of syntax:
    Copy code
    tests:
      - description: test for persona 1
        vars:
          - file://path/to/pdf
        assert:
          - type: assert-set
            transform: next(o for o in output['pessoas_analisadas'] if o['nome'] == 'NAME OF PERSON')
            assert:
              - type: python
                value: ('Outorgar Poderes' in output['restricoes_valor'] and '12' in output['restricoes_valor']['Outorgar Poderes']['valor_alcada'] ...
    Opinions?
    a
    • 2
    • 1
  • Metrics defined via named_scores in python file assertion not showing in UI
    d

    Donato Azevedo

    06/05/2025, 5:05 PM
    I am hesitant to open a bug issue in the github repo simply because this issue (https://github.com/promptfoo/promptfoo/issues/1626) mentions there has a been a fix last year. However, I am trying the same return from the OP of the issue and not getting any metrics shown in the UI:
    Copy code
    python
    def get_assert(output: dict[str, any], context) -> bool | float | GradingResult:
        return {
          'pass': True,
          'score': 0.11,
          'reason': 'Looks good to me',
          'named_scores': {
             'answer_similarity': 0.12,
             'answer_correctness': 0.13,
             'answer_relevancy': 0.14,
          }
        }
    I was expecting to see the three
    answer_*
    named metrics appearing up top https://cdn.discordapp.com/attachments/1380231112252723332/1380231112629944341/Screenshot_2025-06-05_at_14.04.57.png?ex=68431fe4&is=6841ce64&hm=1e59574abdb29e00278b719905d74efcea8620d330947203cbe31d0c4fc9301b&
    s
    • 2
    • 9
  • JSON errors during prompt generation
    b

    Bryson

    06/06/2025, 7:04 PM
    When I'm attempting to generate my redteam prompts (promptfoo redteam generate), I keep consistently running into "SyntaxError" issues in JSON, seemingly at inconsistent points during generation. I've tried multiple times and keep running into the same error:
    Copy code
    [chat.js:161]     completions API response: {"id":"chatcmpl-BfWK1RSvr3LAckI7hoUHM9dYo0Zce","object":"chat.completion","created":1749235393,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"message":{}
    <anonymous_script>:430
    
    
    SyntaxError: Expected ',' or '}' after property value in JSON at position 1966 (line 430 column 1)
        at JSON.parse (<anonymous>)
        at encodeMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:95:32)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async addMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:122:33)
        at async action (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/index.js:195:34)
        at async applyStrategies (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:241:35)
        at async synthesize (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:678:85)
        at async doGenerateRedteam (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/commands/generate.js:243:88)
    Is this a Promptfoo bug by chance? Or is it possible I'm doing something wrong? Happy to DM over my promptfooconfig.yaml if helpful
    i
    • 2
    • 3
  • Does promptfoo execute MCP tool calls?
    s

    sasha

    06/09/2025, 7:42 PM
    I’m wondering if promptfoo executes a MCP models tool calls, when it’s requested by the model or is that only possible when function callbacks are implemented?
    i
    f
    • 3
    • 5
  • Which provider/model is used for simulated_user?
    v

    vibecoder

    06/10/2025, 6:09 AM
    For the functionality listed here: https://www.promptfoo.dev/docs/providers/simulated-user/, which model/provider generates the response on behalf of the user? How can we provide configuration for the model to use for response generation?
    s
    • 2
    • 2
  • is latency assertion via Google Sheets broken? (Image attached)
    a

    anurag

    06/10/2025, 3:30 PM
    No matter what I enter in the latency, I always get the value of
    0.75ms
    https://cdn.discordapp.com/attachments/1382019030801584148/1382019031208427571/image.png?ex=6849a105&is=68484f85&hm=180be1f47d77da7022deef500bd41f20d9b2cbcdb7f1c54ce8a3803f5bb334eb& https://cdn.discordapp.com/attachments/1382019030801584148/1382019031539650581/image.png?ex=6849a105&is=68484f85&hm=d91040592a9966b5ff36fab22c0cfa37f1b6662c568578d534d0de07ab0040fb&
    i
    • 2
    • 1
  • How to print messages to stdout without needing --verbose?
    d

    Donato Azevedo

    06/10/2025, 6:40 PM
    I have a use case where I perform a long action on the
    beforeAll
    extension hook. I wanted to print some info to stdout about the progress, without needing to pass
    --verbose
    when running to view it. How can I do it?
  • Speech / quotation marks
    g

    grj373

    06/11/2025, 4:14 PM
    Hello, is there any way to double escape speech marks? ... or to prevent prompts from containing speech marks?
  • Can websocket use headers config for authorization?
    b

    Bronwyn

    06/12/2025, 3:49 AM
    Hi I want to ask if promptfoo already support headers in websocket because I'm getting Websocket error: {} when trying to connect to the websocket. It connects fine when connecting with wscat via terminal My promptfooconfig.yaml providers: - id: 'wss://websocket.example.url' config: headers: Authorization: Bearer messageTemplate: '{"action": "sendMessage", "content": "{{prompt}}"}' transformResponse: 'data.content' timeoutMs: 20000 Thank you
    s
    i
    • 3
    • 3
  • redteam provider ignored?
    p

    phillprice

    06/13/2025, 11:00 AM
    Hello I'm trying to setup red-team but generating the prompts with either an azure open ai model or a vertex gemini model. Both of them say they're picking up the the provider but the prompt generation still appears proxied in promptfooconfig.yaml
    Copy code
    redteam:
      provider: vertex:gemini-2.0-flash
    I would expect to see vertex prompts not the https://api.promptfoo.app/api/ in logs or am I missing something https://cdn.discordapp.com/attachments/1383038229430800417/1383038229657423903/message.txt?ex=684d5639&is=684c04b9&hm=e886247025063d28d786293cbbb0f5896f90d0fef84507c57f7a63f9794547fa&
    s
    • 2
    • 2
  • Python var loader not working in GitHub Actions CI
    s

    sreven12

    06/16/2025, 3:13 AM
    Hi! I'm running into an issue while using promptfoo in a GitHub Action CI pipeline triggered by a Pull Request. Here's the setup: In my PR, I have a test config file like this: prompts: - '{{query}}' providers: - python:rag_provider.py:call_api defaultTest: assert: - type: answer-relevance threshold: 0.7 - type: context-relevance threshold: 0.7 vars: context: python:rag_context_loader.py:get_var tests: - question.csv The intention is for the context variable to be dynamically loaded by executing the Python function get_var() inside rag_context_loader.py. However, during the GitHub Action run (CI triggered by the PR), promptfoo does not seem to execute the Python function. Instead, it sets the context to the raw string "python:rag_context_loader.py:get_var".
    i
    • 2
    • 2
  • Filtering tests/scenarios
    s

    straygar

    06/18/2025, 12:07 PM
    Heya. Tldr: how do you write, debug and run parts of your tests & scenarios? I have a pretty big
    promptfooconfig.yaml
    file with different scenarios etc. to catch regressions in CI and evaluate potential prompts & models. I use a custom Python provider & some custom python assertions, along with built in ones. My current process of authoring new tests is: - add new test - comment out most of the file - if something goes wrong, add print statements, rinse and repeat Obviously this is not the best. I was never able to get
    --filter-pattern
    or
    --filter-metadata
    to work, and the entire
    promptfooconfig
    file is always run. I just found a way to potentially attach a python debugger, but it's a bit rough: https://github.com/promptfoo/promptfoo/commit/41cc82b2489efce4b167ebb25cc8cc6bcaf667b9
    i
    g
    • 3
    • 6
  • How to integrate promptfoo with our current prompt package structure
    n

    Nate

    06/18/2025, 3:09 PM
    We currently organize our prompts into directories mainly centered around an "invocation" function which is responsible for 1. filling in prompt template variables, 2. calling llm, 3. doing some minor post-processing on the response. I am currently thinking that the way to integrate would be to create a custom provider which takes in the name of one of these prompt directories as the "prompt" and invokes it with the vars given, would this be an advisable way to structure things? It feels like it may be a bit of a hack. Also open to modifying current structure as we already have prompts in a template format that is compatible with promptfoo, but it would be nice to include our post-processing logic in our evals without duplicating code. Also if relevant, working in typescript
    i
    • 2
    • 1
  • Help regarding getting Base64 Image final prompt and which strategy triggered it
    u

    _wutato_

    06/19/2025, 7:24 AM
    Hi! I am performing testing on an LLM in dev env with the following:
    Copy code
    plugins:
     - id: debug-access  # Tests for exposed debugging interfaces and commands
      strategies:
        - id: basic  # Original plugin tests without any additional strategies or optimizations
        - id: jailbreak:composite  # Combines multiple jailbreak techniques for enhanced effectiveness
        - id: jailbreak:likert  # Uses Likert scale-based prompts to bypass content filters
        - id: jailbreak  # Single-shot optimization of safety bypass techniques
    ~~Saw a "Fail" case that made use of Base64 encoded image as shown in the uploaded picture. Would like to check if it is possible to get the final prompt sent to the LLM or how this prompt is created. Also unsure if the prompt was generated under DebugAccess or by a strategy. Any help or advice on where to check is greatly appreciated. Thank you!~~ *Edit: * Realised the UI interprets the prompt as a base64 image because of the following line in the ResultsTable.tsx
    Copy code
    if (
           typeof value === 'string' &&
            (value.match(/^data:(image\/[a-z]+|application\/octet-stream);base64,/) ||
             value.match(/^\/[0-9A-Za-z+/]{4}.*/))
    )
    Still unsure about the actual prompt being sent out, as manually using the prompt listed in the image gets a PASS response. https://cdn.discordapp.com/attachments/1385158186046324829/1385158186494984294/base64_fail.PNG?ex=68565e16&is=68550c96&hm=970229f01b0872c18132009b9e4e8014b90515de8f2aaa8fb687b76892458fb4&
    i
    • 2
    • 4
  • How to test the output returned by an MCP Tool?
    d

    Dan

    06/20/2025, 8:47 AM
    Hello! I have a local MCP server developed using FastMCP. What I want is to check if a user prompt is causing the execution of a tool and then I want to do some asserts on the output of the MCP tool call. How can I do that locally?
    s
    s
    • 3
    • 2
  • Anyone knows what TypeError: Cannot read properties of undefined (reading 'startsWith') means?
    a

    Alba

    06/20/2025, 10:43 AM
    Hi! I was just using promptfoo for the first time and this message popped up in all my evaluation. I've checked my Python and there is no 'startsWith', so don't know what it means. The only thing I've got in my evals is ERROR but not explained. Any idea? Thanks!
    i
    • 2
    • 3
  • Export red teaming report PDF
    m

    mjt2007

    06/20/2025, 6:21 PM
    Is it possible to export the red teaming results PDF from the CLI? Currently the promptfoo redteam report only starts the web server browser UI. I don’t see any other options in the command line documentation. Thanks
    i
    • 2
    • 1
  • Eval variable columns not displayed in self-hosted
    g

    GuillermoB

    06/20/2025, 10:46 PM
    We moved from local evals to remove evals. We run the evals in local and upload them with --share. Works fine, however the variable columns are not displayed in the self-hosted remote promptfoo web ui. What may be going on? https://cdn.discordapp.com/attachments/1385752733587603476/1385752734279536832/image.png?ex=6857364d&is=6855e4cd&hm=24a7e9c2b657b9c8cf40a9572f70d593ab8493134b8ab3179a4d1f90f407646a& https://cdn.discordapp.com/attachments/1385752733587603476/1385752735302815844/image.png?ex=6857364e&is=6855e4ce&hm=ffcaaace2cfa2f4ce41282f61b46c78516c507100f3339727ee9947044dac0ae&
    s
    • 2
    • 7
  • How to avoid Template render error without affecting prompt (Python code generates tests & prompt)?
    m

    marco.sbodio

    06/22/2025, 12:35 PM
    Hello, I have documented my problem in this github issue: https://github.com/promptfoo/promptfoo/issues/4538. Any help is highly appreciated! 🙂 Thank you!
    • 1
    • 2
  • TokenUsage in openAI returns 0
    a

    ahmedelbaqary.

    06/24/2025, 1:55 PM
    When I send an evaluation request, the test is done correctly, but when looking into the tokenUsage for prompt and completion it gives 0 but when I use Anthrobic models, these values has numbers Do anyone knows why this happens?! https://cdn.discordapp.com/attachments/1387068500501069965/1387068500610125944/image.png?ex=685bffb5&is=685aae35&hm=f748fdf7f49d957a04af47dc566ee25f884c591f3f681c984afcc724e4dcc426&
  • OpenAI Responses previous_response_id
    g

    Greg

    06/26/2025, 11:02 PM
    Hi all, I'm trying to leverage 2 parameters of the OpenAI Responses API: prompt and previous_response_id. I have the code below that works but unfortunately then the output of all my tests is just the response ID instead of being the OpenAI text response. Is there an option to save the response ID while still having the test return the text? Here's the relevant parts of our setup: ``` prompts: - "dummy" providers: - id: https config: url: https://api.openai.com/v1/responses method: POST headers: Authorization: "Bearer {{ env.OPENAI_API_KEY }}" body: | { "model": "gpt-4.1-nano-2025-04-14", "input": "{{ tenant_msg }}", "prompt": { "id": "{{ promptId }}" }, {% if previous_response_id %} "previous_response_id": "{{ previous_response_id }}", {% endif %} "store": true } transformResponse: | ({ output: { text: json.output[0].content[0].text, response_id: json.id } }) defaultTest: vars: previous_response_id: "" tests: - description: "1 • tenant contacts" vars: { tenant_msg: "Good morning.", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id - description: "2 • tenant follow-up" vars: { tenant_msg: "Did you get my previous message?", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id
  • (solved by clean cache)browser@mac show empty in recent release for `promptfoo view -y`
    n

    no1care

    06/27/2025, 6:56 AM
    Mac air, promptfoo (0.115.4), chrome,firefox all has issues. The error in developer tool is > Uncaught SyntaxError: expected expression, got '<' Did i mess up some configuration?
    s
    • 2
    • 4
  • Resume Red Team Eval?
    b

    Bryson

    07/01/2025, 10:36 PM
    Is there a way to resume a red team eval from the progress made once it has begun? If it's interrupted or runs into errors on the way and the eval is canceled, is there a way to pick up where it left off so it doesn't repeat all the same calls? Running into this issue with large evals that take multiple hours, and it's a problem when you have to kill and restart the eval from scratch. I can't find anything in the docs about this.
    i
    • 2
    • 3
  • Image support for gemini prompt?
    m

    Mahta

    07/08/2025, 9:04 AM
    Hi everyone, I'm trying to test whether Gemini can reliably detect certain objects in images I provide. However, I noticed that it doesn't seem to process the images at all—it just returns what looks like a random list of objects, even when they aren't present. I also couldn't find much specific documentation about Gemini's vision/image input format or capabilities (unlike OpenAI, which has more detailed guides). Has anyone here successfully used Gemini to analyze images and detect specific objects or even description of the image? Any tips or examples would be appreciated!
  • How do I disable the "thinking" mode in the Qwen model using `promptfoo`?
    r

    raxrb

    07/10/2025, 6:51 PM
    I have tried the followin config
    Copy code
    - id: groq:qwen/qwen3-32b
        label: "qwen3-32b"
        config:
          thinking:
            type: 'none'
            budget: 0 # For complex proofs
          temperature: 0 # It's good practice to set temperature for deterministic evals
          reasoning:
            effort: none
            type: 'disabled'
            budget: 0 # For complex proofs
          reasoning_format: hidden # This line removes the 'thinking' output
          showThinking: false
          showReasoning: false
    As you can see in the thinking mode is coming. https://cdn.discordapp.com/attachments/1392941393386934282/1392941393751707718/image.png?ex=68715d43&is=68700bc3&hm=a34b6bb70b7415f6f696bf1ef89745c33cf633363867e462ba8b6750a5f0fcc7&
  • XML output file type on CI/CD
    c

    CYH

    07/10/2025, 11:13 PM
    On https://www.promptfoo.dev/docs/integrations/azure-pipelines/, the example code says it can publish test result from
    promptfoo-results.xml
    . However, I got
    No test result files matching '[ 'promptfoo-results.xml' ]' were found
    . Is this expected? How can I publish the promptfoo test result? On the [eval option page](https://www.promptfoo.dev/docs/usage/command-line/), output flag only support csv, txt, json, jsonl, yaml, yml, html. It doesn't have xml.
    i
    • 2
    • 1
  • Multiple LLM conversation
    c

    CYH

    07/14/2025, 6:33 PM
    I have a pipeline where there's a main LLM having conversation with the user, and a few other auditor/monitor LLM to guide the main LLM where the conversation should go. Is there a way to simluate this type of multi LLM convo through promptfoo?
  • How to View Grader Response for Model Graded Closed QA tests
    s

    Sudharshan

    07/14/2025, 7:58 PM
    I have some tests that run model graded qa tests however when the tests pass i can only see submission has passed the assertion on the result. is it possible to view the full response of the grader to see how it has evaluated ?
    i
    • 2
    • 1
  • Cannot read properties of undefined (reading 'includes')
    y

    yahmasta

    07/16/2025, 10:05 PM
    Getting the error when running Minimal Test or RAG. Haven't test it on other presets.
    i
    • 2
    • 1
  • Trying MCP tools usecase... Getting error while fetching tools.
    s

    Saraswathi Rekhala

    07/18/2025, 4:09 AM
    Hey, I'm trying POC for fetching tools on MCP server usecase... by following below documentations link: https://www.promptfoo.dev/docs/providers/openai/ https://github.com/promptfoo/promptfoo/blob/main/examples/openai-mcp/promptfooconfig.approval.yaml I'm getting below error:: API error: 424 Failed Dependency {"error":{"message":"Error retrieving tool list from MCP server: 'wm-app-mcp-server'. Http status code: 424 (Failed Dependency)","type":"external_connector_error","param":"tools","code":"http_error"}} Below is my mcp server logic exposed with add_number tool: import os from mcp.server import FastMCP mcp = FastMCP(name= "wm-app-mcp-server") @mcp.tool() def add_numbers(a: int, b: int) -> int: """ Returns the sum of two numbers """ return a + b def main(): mcp.settings.host="0.0.0.0" mcp.settings.port=8080 mcp.settings.debug=True mcp.run(transport="sse") mcp.expose_tools_endpoint=True if __name__ == "__main__": main() And my promptfoo yaml file has below provider info: providers: # Provider with no approval required - id: openai:responses:gpt-4.1-2025-04-14 label: 'WM MCP Server' config: tools: - type: mcp server_label: wm-app-mcp-server server_url: http://localhost:8080/sse require_approval: never max_output_tokens: 1000 temperature: 0.2 instructions: 'You are an assistant. Use the available MCP tools to search for information.' Can some one help me in resolving the error? Before running the test in promptfoo the server is made up and running but still I'm getting 424 status code error while fetching tools.