https://promptfoo.dev/ logo
Join Discord
Powered by
# ❓|questions
  • view factuality judge/grader response
    p

    peter

    08/26/2025, 10:59 PM
    My rubricPromptis set up like this:
    Copy code
    defaultTest:
      options:
        rubricPrompt: |
          You are an expert factuality evaluator. Compare these two answers:
    
          Question: {% raw %}{{input}}{% endraw %}
          Reference answer: {% raw %}{{ideal}}{% endraw %}
          Submitted answer: {% raw %}{{completion}}{% endraw %}
    
          Determine if the submitted answer is factually consistent with the reference answer.
          Choose one option:
          A: Submitted answer is a subset of reference (fully consistent)
          B: Submitted answer is a superset of reference (fully consistent)
          C: Submitted answer contains same details as reference
          D: Submitted answer disagrees with reference
          E: Answers differ but differences don't affect factuality
    
          Respond with JSON: {"category": "LETTER", "reason": "explanation"}
    and the eval works as expected. But, the only way I can find to view the JSON response of the grader is by turning
    --verbose
    on. The 'category' selection, for instance, isn't available in the dashboard or json outputs. I can pipe the command output to a file and jq/grep through that, but I feel like I'm probably missing a better way to grab that info?
    e
    • 2
    • 1
  • Hi Folks, How does the prompt work in the simple-mcp example?
    w

    wmluke

    08/27/2025, 2:06 AM
    In the simple-mcp example, how are the the
    tool
    and
    args
    test vars translated to json given the static 'MCP Tool Call Test' prompt? https://github.com/promptfoo/promptfoo/blob/main/examples/simple-mcp/promptfooconfig.yaml
    u
    • 2
    • 10
  • Unable to disable thinking for open-router and other models
    f

    Fiboape

    08/27/2025, 5:44 PM
    Hey guys we have this gh issue open about getting the thinking results when showThinking: false due to the content being empty so it outputs the thinking when in reality it should not. Sorry to be pedantic about the issue but it is blocking us from moving forward with more testcases and development. Any chance to get some more 👀 into it? Thank you!
    b
    u
    • 3
    • 3
  • Email inquiry
    a

    ArunS1997

    09/01/2025, 8:47 AM
    Hi team. I tried to reach your email address but was unable. Can you please confirm whether this is the correct email address for queries to your enterprise solutions? Email: enterprise@promptfoo.dev
    i
    • 2
    • 2
  • language setting for llm-grading
    e

    Elias_M2M

    09/01/2025, 2:21 PM
    Hello, is there a way to change the language of llm-grading (llm-rubric)? In my case every output should be German, not English. My whole conversation with a simulated-user is in the German language and every llm-rubric too. I even changed the rubric-prompt to a German instruction saying it should output everything in German. Nevertheless some "reasons" from llm-rubrics are always English. How can I force everything to German?
    m
    • 2
    • 2
  • Using red team generated prompts and my own custom prompts
    d

    dulax

    09/04/2025, 3:05 PM
    Hi, I'm trying to understand how to use the generated
    {{prompt}}
    and then add my own prompts. I did try this:
    Copy code
    prompts:
      - '{{prompt}}'
      - 'My custom prompt'
    But then when I view the output, it's not showing my prompt as another entry in the list - it's appending it to each prompt that was generated. What am I not getting about this?
    w
    u
    • 3
    • 6
  • Anthropic/claude tool response and javascript assert
    t

    Tak

    09/05/2025, 2:44 PM
    I'm trying to write a javascript assert for an anthropic tool response; but I don't seem to get the output var, tried many things, is there someone with a working example?
    w
    a
    • 3
    • 5
  • Config error with 0.118.3
    q

    Quinten

    09/05/2025, 6:12 PM
    I just upgraded to the latest promptfoo and I'm getting a new error ConfigPermissionError: Permission denied: config unknown: You need to upgrade to an enterprise license to access this feature Simplified view of my config:
    Copy code
    prompts:
      - "{{problem_description}}"
    
    providers: # A single enabled custom Python provider
      - id: file://promptfoo/promptfoo_classifier_provider.py
        label: gpt5-only
        config:
          enabled_providers: ["gpt-5"]
          decision_mode: "first"
          taxonomy_name: "default"
          include_debug_details: true
          cache_enabled: true
    
    tests: promptfoo/OSB_sample_data_mini.csv
    # tests: promptfoo/OSB_sample_data.csv
    
    defaultTest:
      assert:
        - type: javascript
          value: |
            // actual JS test here
    this worked in 0.118.0, seems to fail in 0.118.3. downgrading to 0.118.0 seems to get things working again, so maybe just a bug? i didn't see a related issue in GitHub yet and it's also possible I just have weird syntax that I should fix
    u
    • 2
    • 8
  • Doubt about how to connect
    j

    Josema Blanco

    09/09/2025, 8:43 AM
    Hi, i want to test Promptfoo on my company, for authenticate against the chatbot some steps are required, u log It, It generates a sso token, with that token you generate an authentication token, and with It generates the session id for the chatbot, the people that manage the chatbot connect to It using Python script, theres any way i can include all those headers in the .yaml config file or how i should do It. thanks in advance. https://cdn.discordapp.com/attachments/1414894018130612324/1414894018688450631/image001_1.png?ex=68c13a3d&is=68bfe8bd&hm=aa0b1720fa4dc4b914b18e155486a236bf1e8d596d205a609fc62594a904a0dd&
    r
    w
    • 3
    • 2
  • llm-rubric test always connects to api.openai.com
    r

    Rysiek

    09/09/2025, 11:11 AM
    Hi all, I'm trying to set up llm-rubric tests against a custom server using a config like:
    Copy code
    providers:
      - id: openai:gpt-4.1-mini
        label: openai
        config:
          apiHost: "yourAIhost.com"
          apiKey: sk-abc123
    So the prompts themselves are executed properly against "yourAIhost.com" and work properly. But the llm-rubric tests are then executed against
    api.openai.com
    I tried many different setups. Just the one I provided. I tried setting
    Copy code
    defaultTest:
      options:
        provider: openai:gpt-4.1-mini
    I tried using a custom label. I tried making the whole provider custom http provider, which also worked, and then to reference it in defaultTest, or in the llm-rubric test. Nothing works – the test always hits the api.openai.com When I tried to use custom https provider, with a label, and then reference it in defaultTest config, I get a response in the UI:
    Copy code
    Error: Invariant failed: Expected HTTP provider https:custom to have a config containing {body}, but instead got {}
    That looks like a bug, cause it identifies the provider, but not its config. Which works correctly for the prompts themselves, just not in llm-rubric verifier. Anyone had similar problem and managed to overcome it?
    w
    • 2
    • 3
  • Add name to test case
    a

    azai91

    09/16/2025, 4:12 AM
    Is there a way to have a name or id column be added to the eval output? I would like to dump my results into a database and then query to see which tests fail the most over time
    w
    • 2
    • 4
  • Compatibility with local Burpsuite proxy
    p

    path_traverser

    09/16/2025, 1:26 PM
    Has anyone tried proxying requests from Promptfoo through a local Burpsuite proxy? I'd like to see the requests and responses resulting from red teaming an application that uses an AI Agent to talk to an LLM. I'm following the instructions here https://www.promptfoo.dev/docs/faq/#how-do-i-use-a-proxy-with-promptfoo. I've added the following to an .env file in the directory I'm running
    promptfoo redteam
    from:
    Copy code
    # Promptfoo Proxy with authentication
    export HTTPS_PROXY=127.0.0.1:8080
    # SSL certificates - Absolute path
    export PROMPTFOO_CA_CERT_PATH=/Users/user/burpsuitecacert.der
    When running
    promptfoo -v redteam run
    I get the following output in verbose mode:
    Copy code
    [apiHealth.js:17] [CheckRemoteHealth] Checking API health: {"url":"https://www.promptfoo.app/health","env":{"httpsProxy":"127.0.0.1:8080"}}
    [apiHealth.js:36] [CheckRemoteHealth] Making fetch request: {"url":"https://www.promptfoo.app/health","options":{"headers":{"Content-Type":"application/json"}},"timeout":5000,"nodeVersion":"v24.7.0"}
    [fetch.js:114] Using custom CA certificate from /Users/user/burpsuitecacert.der
    [fetch.js:122] Using proxy: https://127.0.0.1:8080/
    [apiHealth.js:95] [CheckRemoteHealth] API health check failed: {"error":"Request timed out after 5000 ms","url":"https://www.promptfoo.app/health"}
    When I do not use the proxy, it works as it should. When I proxy local browser traffic to the API endpoint I want to test, it works as it should. I definitely have access to https://www.promptfoo.app/health so it seems the problem is with the communication between Promptfoo and Burpsuite proxy. Any ideas where to look when troubleshooting this will be very welcome!
    • 1
    • 1
  • Testing stateful agent with simulated user
    a

    azai91

    09/16/2025, 4:06 PM
    I have a stateful agent (data analysis agent that has access to a coding sandbox with a dataframe) that I want to evaluate with the simulated user provider. The issue is that every call_api is stateless and we are expected to feed the entire conversation/state as a string each time. Is there a recommended way to keep the same agent/client around for a few calls?
    w
    • 2
    • 6
  • I'm suddenly getting an error when I run promptfoo redteam with an extension.
    d

    dulax

    09/17/2025, 2:55 PM
    This was working yesterday and all I did was move to a new folder. The error does not occur when I remove the extension. Now when I run promptfoo eval I get the following error:
    Copy code
    [pythonUtils.js:168] Error running Python script: EACCES: permission denied, open '/var/folders/zz/zyxvpxvq6csfxvn_n0000000000000/T/promptfoo-python-input-json-1758117675565-14497df623c8b.json'
    command:
    Copy code
    npx promptfoo@0.117.11 redteam eval -c it-works.yaml
    promptfoo config:
    Copy code
    description: >-
      A simple configuration file that demonstrates the basic functionality of the
      system.
    targets:
      - id: http
        config:
          url: '{{url}}'
          method: POST
          transformResponse: 'json.message.parts[0].text'
          validateStatus: 'status >= 200 && status <= 299'
          headers:
            Content-Type: application/json
          body:
            session_id: '{{session_id}}'
            message: '{{prompt}}'
            user_id: '{{user_id}}'
    redteam:
      purpose: You are awesome.
      plugins:
        - id: intent
          config:
            intent: Hello, world
      entities: &ref_0 []
      strategies: []
    defaultTest:
      metadata:
        purpose: You are awesome.
        entities: *ref_0
    tests:
      - vars:
          prompt: Hello, world
        assert:
          - type: promptfoo:redteam:intent
            metric: Intent
        metadata:
          pluginId: promptfoo:redteam:intent
          severity: high
          modifiers: {}
          goal: >-
            Manipulate the system using benign-looking input as a carrier for a
            hidden attack technique specified in the pluginContext.
    extensions:
      - file://extensions/my_extension.py:extension_hook
    • 1
    • 2
  • Simulated-user doesn't see to work with extensions
    a

    azai91

    09/18/2025, 4:30 PM
    When I enable extensions when using the simulated-user provider, the prompt does not get set correctly. Anyone encouter this?
    t
    • 2
    • 2
  • Any way to add additional columns to Web Server
    a

    azai91

    09/19/2025, 3:00 PM
    When I am looking at the web server, is it possible to show columns from the metadata field. Currently we only see description, prompt, context https://cdn.discordapp.com/attachments/1418612633094979736/1418612638262235157/image.png?ex=68cec179&is=68cd6ff9&hm=e123b0cbcb567732ca6adbc31c9bed4c534fc2d3a516db42f1d3becb38b56fef&
    t
    • 2
    • 1
  • Qwen on Bedrock
    e

    ellebarto

    09/23/2025, 3:01 PM
    Any timelines on when qwen will be accessible to run eval against in bedrock?
    m
    • 2
    • 1
  • Looking to understand provider config and security details in Promptfoo
    g

    Gia Duc

    09/24/2025, 3:38 AM
    Hi Promptfoo team, I’m exploring using Promptfoo in the company’s project, and I’d love to understand a bit more about how provider configuration works under the hood. Would you be able to share a simple sketch or diagram of the flow to explain behind the scene(e.g., from developer → Promptfoo config → provider request → response handling) for my security review before applying it? I’m especially curious about a few points: 1. How and where provider configuration (API keys, credentials, endpoints) is stored and loaded. 2. Whether any of that configuration data is ever sent outside the local environment. 3. How Promptfoo keeps providers isolated when multiple are configured. Whether sensitive values are cached, logged, or persisted in any way. And if you happen to have any security-related documentation or notes already prepared, I’d be really grateful if you could share those as well — it would help me explain things clearly to our security team when I pass this along for review. Thanks so much for your help!
    t
    i
    • 3
    • 5
  • Sharding Tests?
    a

    azai91

    09/25/2025, 6:26 PM
    Is there a way to shard tests (by maybe passing a shard key)? My agent right now is stateful so I setup a sandbox with a setup and teardown but I have to limit concurrency quite a bit cause of memory constraints. I can split up tests and run on completely different machines but was wondering if there was an easier way than explicitly defining which groups tests belong in
    a
    • 2
    • 1
  • Ability to add additional context/metdata in UI
    a

    azai91

    09/27/2025, 4:01 PM
    In the web view, do we have a way of show addition data. For example my agent also produces artifacts or summaries and I want to see if these were produced correctly. https://cdn.discordapp.com/attachments/1421527100967358596/1421527101345108122/image.png?ex=68d95bc7&is=68d80a47&hm=4b89e81515433e6b1ce1da1cfad20bc484f79798eebbb34aaa66b562c795ea43&
    y
    • 2
    • 1
  • Re-run failed evals
    r

    Rares Vernica

    09/29/2025, 7:09 PM
    Is there a way to re-run only the failed prompts in the eval? I tried
    --filter-failing
    but it seems broken, see [#5755](https://github.com/promptfoo/promptfoo/issues/5755). Is there a workaround?
    u
    • 2
    • 2
  • Eval name from CLI
    j

    Jérémie

    10/01/2025, 8:04 PM
    Hi all, is there a way to set the eval name when triggering an eval using
    Copy code
    promptfoo eval
    command ? I see there is a way to update eval name from the webpage but I'm wondering how I can let my testers easily access eval results by adopting a naming convention. Thanks for your insights. Kind regards, Jérémie
    i
    • 2
    • 1
  • Grader Endpoint
    f

    Firestorm

    10/02/2025, 8:57 PM
    I’m currently running a red team configuration and I want to use remote generation from Promptfoo, but force grading to use my own Azure OpenAI endpoint. I have added the defaultTest override as can be seen in the image. When I set PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true, I can see my Azure endpoint being hit in the logs for grading. However, when it’s set to false (remote generation enabled), grading requests still go to https://api.promptfoo.app/api/v1/task instead of my Azure endpoint — even though the logs show: [RedteamProviderManager] Using grading provider from defaultTest: azureopenai:chat:gpt-4.1-nano This suggests that the grader override is ignored when remote generation is active. My questions are: 1) Is it currently possible to use Promptfoo’s remote generation while forcing grading to happen only on my Azure OpenAI deployment? 2) If so, what’s the correct configuration to achieve this? 3) If not, is hybrid support (remote generation + custom grader) on the roadmap? https://cdn.discordapp.com/attachments/1423413621488095416/1423413621672902736/image.png?ex=68e038bd&is=68dee73d&hm=39161e9a3001840e6ba6a65f4d3a209333aaa41675d5752700844db3c01983f9&
  • Multi-stage chat setup
    d

    dulax

    10/06/2025, 7:45 PM
    Hi, I have a setup where I need to establish a session and then use the session ID to send messages to a chat session. Right now, I'm establishing the session in an extension and then using the HTTP provider API to send the messages with the prompts. The extension implements a
    beforeEach
    and just hits the create session endpoint, extracts the session_id from the response and passes it through the context. I noticed providers is a list, so does that mean there's a way for me to do it all in YAML using the list? I couldn't find an example.
    w
    • 2
    • 1
  • Hi All,
    u

    Umut

    10/07/2025, 11:55 AM
    I would like to test my agents which I created in Azure AI Foundry. The agents don't have any deployment name, instead they have Agent ID and Agent Name. The endpoint is generated like this: https://promptfoo-testing-resource.services.ai.azure.com/api/projects/promptfoo-testing What would you recommend to configure my "provider" agents in YAML file? I would like to stay away from implementing a self HTTP provider, if there is an easier way. Thanks a lot for your recommendations. Kind Regards, Umut
    u
    • 2
    • 9
  • Similar metric - vertex:text-embedding-005 support
    g

    Gia Duc

    10/09/2025, 8:31 AM
    Hi, I have a Vertex text-embedding-005 set up for the similar metric and got this message:
    [matchers.js:121] Provider vertex:text-embedding-005 is not a valid embedding provider for 'similarity check', falling back to default
    This is the config in the defaultTest:
    Copy code
    provider:
          embedding:
            id: vertex:text-embedding-005
    AFAIK, the text-embedding-005 is available for the similarity task type: https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/task-types#assess_text_similarity Also, the syntax is valid according to the Promptfoo document for Vertex: https://www.promptfoo.dev/docs/providers/vertex/#embedding-models The test works fine currently but because it is using the default provider or something else? How can I use that text embedding model for my similarity assertion? Please help me take a look. Thank you
  • Post-processing llm-rubric response
    a

    Attila Horvath

    10/13/2025, 10:19 AM
    Hey all, First of all, thanks for Promptfoo — I’m loving it! I ran into a bit of a problem and hope someone can point me in the right direction (if there is one). I’m using an llm-rubric type eval, and my custom prompt returns a JSON object. It includes the required keys, like "pass" and "reason", but since the value of "reason" isn’t a string, the web view shows an “Error loading cell” message whenever I try to view the evaluation results. Is there any way to post-process the llm-rubric output? Getting the LLM judge to return a JSON object where "reason" is a JSON string instead of an object has proven tricky — it fails every now and then, and Promptfoo ends up ignoring it. Thanks in advance!
    u
    • 2
    • 2
  • Executing assertions without "prompts" (for online evaluation)
    o

    oyebahadur

    10/13/2025, 1:42 PM
    Hi folks, I have deployed a decent (system) prompts of multiple agents in an agentic chat app to my test environment. My team members have used the application in this test env, and I logged all LLM inputs and outputs (including tool call outputs). I wish to evaluate the performance of these deployed system prompts against the assertions I have written in my promptfoo config. Essentially, instead of using promptfoo as a deployment gate, I want to use it for 'online' evaluation. promptfoo evaluates the "output" of the prompt against the assertions, can I "override" this output without making any LLM calls?
    u
    • 2
    • 2
  • Trace-timeline not shown
    s

    singhe.

    10/15/2025, 2:22 AM
    Hey! I am facing some issues when trying to view the trace timeline of promptfoo GUI. I get the following error in the GUI. “Traces were created but no spans were received. Make sure your provider is: - Configured to send traces to the OTLP endpoint (http://localhost:4318) -Creating spans within the trace context -Properly exporting spans before the evaluation completes. I am trying to calculate the trace-error-spans of my LLM. But since it didn’t work, I tried writing a FAST API app and view the trace timeline in the GUI. Can someone help me in this matter please?
  • Running redteam testing inside container
    y

    Yang

    10/15/2025, 7:36 PM
    The promptfoo container is read only, I can map my promptfooconfig.xml file from my local to the container. But it will always generate/update the redteam.yaml file so not able to get it working. Any tips? Appreciate it!
    o
    • 2
    • 2