https://promptfoo.dev/ logo
Join Discord
Powered by
# questions
  • What data is sent to PromptFoo's API during remote generation for Red Teaming?
    j

    JohnRoy

    05/23/2025, 12:36 PM
    I understand that PromptFoo uses their own fine-tuned models for attack generation, which are accessible only through PromptFoo's API. Some plugins (like harmful) requires this models even when setting PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true. For these cases, I assume that the whole config file is sent to PromptFoo's server, exposing, for example, the "purpose" section. My question is then particularly about the jailbreak strategy. On the documentation jailbreak is described as following: 1) Starting with a base prompt that attempts to elicit undesired behavior 2) Using an LLM-as-a-Judge to: 2.1) Analyze the AI's response 2.2) Track the conversation history 2.3) Generate increasingly refined prompts based on previous attempts 4) Repeating this process for a configurable number of iterations 5) Selecting the most effective prompt variation discovered Is the generation of point 2.3) taking place again in PromptFoo's servers? I.e., are all the answers of my LLM being "exposed"? Also is it possible to completely replace PromptFoo's unaligned model for a local unaligned one? Thank you
    i
    • 2
    • 4
  • Is it possible to red teamin with node package
    r

    roy

    05/25/2025, 12:33 AM
    Is it possible to red teamin with node package
    i
    • 2
    • 1
  • Share Feature seems to be not working
    p

    pyq

    05/29/2025, 7:34 AM
    I used the command promptfoo share to upload the evaluation results to www.promptfoo.app. However, when I create a shareable link and try to share it with others, the link doesn't work—regardless of whether it's set to public or not. this is a generated sample link: https://www.promptfoo.app/register?invite_code=8d8650f5-8717-428c-a0d3-6a690ae39ed5&eval_id=eval-J77-2025-05-29T07:23:34 https://cdn.discordapp.com/attachments/1377550615995093122/1377550616175579146/Screenshot_2025-05-29_at_3.33.20_PM.png?ex=68395f7c&is=68380dfc&hm=7873b1a815939c905a611c0f20163dd51458407d4c8251eec974677c8f8e61b3&
    s
    • 2
    • 2
  • How do I pass the videos as a multi-model input to the Google's Gemini flash models?
    r

    raxrb

    05/29/2025, 5:56 PM
    Can anybody share me the example to pass multi-modal data like video, images, or audio to the Gemini Flash model?
    • 1
    • 1
  • enterprise
    v

    vibecoder

    05/31/2025, 5:54 AM
    How is enterprise self-host different from teams taking the github version and deploying it on a kubernetes cluster? Also, what is the pricing - it does not list it on the site.
    i
    • 2
    • 3
  • Disabling telemetry
    v

    vibecoder

    05/31/2025, 5:56 AM
    Currently, even after setting the environment variable, there is a single message per eval that gets sent to promptfoo server stating "telemetry is disabled". Can we update the code to not do so if the variable is set to not use telemetry?
  • client generated sessionId
    g

    grj373

    06/02/2025, 2:21 PM
    Hi, using promptfoo I have setup a redteam scan and want to use client generated session ID. However, when selecting this option and placing {{sessionId}} in the body of the request promptfoo sends an empty string for the sessionId. Does anyone know of a known issue with this functionality? Thanks
    s
    i
    • 3
    • 8
  • Allowing file upload for Provider on web eval
    v

    vibecoder

    06/05/2025, 2:05 AM
    Promptfoo allows uploading of tests and YAML config files from web eval page. These get stored in a persistent storage as well. Why not allow the same capability for provider file? If there are more folks looking for similar capability, I can work on submitting a pull request. But before I do so, I am interested in knowing why it is not allowed in the first place.
  • Asserting on complex nested extracted entities
    d

    Donato Azevedo

    06/05/2025, 1:52 PM
    Hi everyone! I'm humbly looking for suggestions and opinions on how to assert on complex, sometimes deeply nested, extracted entities. My task extracts several entities from pdf documents and I am already using promptfoo to assert and build up metrics for performance. But it's getting ever so hard because of the complexity of the extracted entities. For example, this is one of my assertions:
    Copy code
    - type: python
          value: ('Outorgar Poderes' in output['pessoas_analisadas'][1]['restricoes_valor'] and '12' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'] and 'meses' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'])
          metric: alcada
    And this is not even robust, becase it depends on the order of the
    output['pessoas_analisadas']
    list being consistent across different evals. I'd appreciate any sugestion. Meanwhile, I was even considering contributing a
    transform
    property to assert-sets, which would enable this kind of syntax:
    Copy code
    tests:
      - description: test for persona 1
        vars:
          - file://path/to/pdf
        assert:
          - type: assert-set
            transform: next(o for o in output['pessoas_analisadas'] if o['nome'] == 'NAME OF PERSON')
            assert:
              - type: python
                value: ('Outorgar Poderes' in output['restricoes_valor'] and '12' in output['restricoes_valor']['Outorgar Poderes']['valor_alcada'] ...
    Opinions?
    a
    • 2
    • 1
  • Metrics defined via named_scores in python file assertion not showing in UI
    d

    Donato Azevedo

    06/05/2025, 5:05 PM
    I am hesitant to open a bug issue in the github repo simply because this issue (https://github.com/promptfoo/promptfoo/issues/1626) mentions there has a been a fix last year. However, I am trying the same return from the OP of the issue and not getting any metrics shown in the UI:
    Copy code
    python
    def get_assert(output: dict[str, any], context) -> bool | float | GradingResult:
        return {
          'pass': True,
          'score': 0.11,
          'reason': 'Looks good to me',
          'named_scores': {
             'answer_similarity': 0.12,
             'answer_correctness': 0.13,
             'answer_relevancy': 0.14,
          }
        }
    I was expecting to see the three
    answer_*
    named metrics appearing up top https://cdn.discordapp.com/attachments/1380231112252723332/1380231112629944341/Screenshot_2025-06-05_at_14.04.57.png?ex=68431fe4&is=6841ce64&hm=1e59574abdb29e00278b719905d74efcea8620d330947203cbe31d0c4fc9301b&
    s
    • 2
    • 9
  • JSON errors during prompt generation
    b

    Bryson

    06/06/2025, 7:04 PM
    When I'm attempting to generate my redteam prompts (promptfoo redteam generate), I keep consistently running into "SyntaxError" issues in JSON, seemingly at inconsistent points during generation. I've tried multiple times and keep running into the same error:
    Copy code
    [chat.js:161]     completions API response: {"id":"chatcmpl-BfWK1RSvr3LAckI7hoUHM9dYo0Zce","object":"chat.completion","created":1749235393,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"message":{}
    <anonymous_script>:430
    
    
    SyntaxError: Expected ',' or '}' after property value in JSON at position 1966 (line 430 column 1)
        at JSON.parse (<anonymous>)
        at encodeMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:95:32)
        at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
        at async addMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:122:33)
        at async action (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/index.js:195:34)
        at async applyStrategies (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:241:35)
        at async synthesize (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:678:85)
        at async doGenerateRedteam (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/commands/generate.js:243:88)
    Is this a Promptfoo bug by chance? Or is it possible I'm doing something wrong? Happy to DM over my promptfooconfig.yaml if helpful
    i
    • 2
    • 3
  • Does promptfoo execute MCP tool calls?
    s

    sasha

    06/09/2025, 7:42 PM
    I’m wondering if promptfoo executes a MCP models tool calls, when it’s requested by the model or is that only possible when function callbacks are implemented?
    i
    f
    • 3
    • 4
  • Which provider/model is used for simulated_user?
    v

    vibecoder

    06/10/2025, 6:09 AM
    For the functionality listed here: https://www.promptfoo.dev/docs/providers/simulated-user/, which model/provider generates the response on behalf of the user? How can we provide configuration for the model to use for response generation?
    s
    • 2
    • 2
  • is latency assertion via Google Sheets broken? (Image attached)
    a

    anurag

    06/10/2025, 3:30 PM
    No matter what I enter in the latency, I always get the value of
    0.75ms
    https://cdn.discordapp.com/attachments/1382019030801584148/1382019031208427571/image.png?ex=6849a105&is=68484f85&hm=180be1f47d77da7022deef500bd41f20d9b2cbcdb7f1c54ce8a3803f5bb334eb& https://cdn.discordapp.com/attachments/1382019030801584148/1382019031539650581/image.png?ex=6849a105&is=68484f85&hm=d91040592a9966b5ff36fab22c0cfa37f1b6662c568578d534d0de07ab0040fb&
    i
    • 2
    • 1
  • How to print messages to stdout without needing --verbose?
    d

    Donato Azevedo

    06/10/2025, 6:40 PM
    I have a use case where I perform a long action on the
    beforeAll
    extension hook. I wanted to print some info to stdout about the progress, without needing to pass
    --verbose
    when running to view it. How can I do it?
  • Speech / quotation marks
    g

    grj373

    06/11/2025, 4:14 PM
    Hello, is there any way to double escape speech marks? ... or to prevent prompts from containing speech marks?
  • Can websocket use headers config for authorization?
    b

    Bronwyn

    06/12/2025, 3:49 AM
    Hi I want to ask if promptfoo already support headers in websocket because I'm getting Websocket error: {} when trying to connect to the websocket. It connects fine when connecting with wscat via terminal My promptfooconfig.yaml providers: - id: 'wss://websocket.example.url' config: headers: Authorization: Bearer messageTemplate: '{"action": "sendMessage", "content": "{{prompt}}"}' transformResponse: 'data.content' timeoutMs: 20000 Thank you
    s
    i
    • 3
    • 3
  • redteam provider ignored?
    p

    phillprice

    06/13/2025, 11:00 AM
    Hello I'm trying to setup red-team but generating the prompts with either an azure open ai model or a vertex gemini model. Both of them say they're picking up the the provider but the prompt generation still appears proxied in promptfooconfig.yaml
    Copy code
    redteam:
      provider: vertex:gemini-2.0-flash
    I would expect to see vertex prompts not the https://api.promptfoo.app/api/ in logs or am I missing something https://cdn.discordapp.com/attachments/1383038229430800417/1383038229657423903/message.txt?ex=684d5639&is=684c04b9&hm=e886247025063d28d786293cbbb0f5896f90d0fef84507c57f7a63f9794547fa&
    s
    • 2
    • 2
  • Python var loader not working in GitHub Actions CI
    s

    sreven12

    06/16/2025, 3:13 AM
    Hi! I'm running into an issue while using promptfoo in a GitHub Action CI pipeline triggered by a Pull Request. Here's the setup: In my PR, I have a test config file like this: prompts: - '{{query}}' providers: - python:rag_provider.py:call_api defaultTest: assert: - type: answer-relevance threshold: 0.7 - type: context-relevance threshold: 0.7 vars: context: python:rag_context_loader.py:get_var tests: - question.csv The intention is for the context variable to be dynamically loaded by executing the Python function get_var() inside rag_context_loader.py. However, during the GitHub Action run (CI triggered by the PR), promptfoo does not seem to execute the Python function. Instead, it sets the context to the raw string "python:rag_context_loader.py:get_var".
    i
    • 2
    • 2
  • Filtering tests/scenarios
    s

    straygar

    06/18/2025, 12:07 PM
    Heya. Tldr: how do you write, debug and run parts of your tests & scenarios? I have a pretty big
    promptfooconfig.yaml
    file with different scenarios etc. to catch regressions in CI and evaluate potential prompts & models. I use a custom Python provider & some custom python assertions, along with built in ones. My current process of authoring new tests is: - add new test - comment out most of the file - if something goes wrong, add print statements, rinse and repeat Obviously this is not the best. I was never able to get
    --filter-pattern
    or
    --filter-metadata
    to work, and the entire
    promptfooconfig
    file is always run. I just found a way to potentially attach a python debugger, but it's a bit rough: https://github.com/promptfoo/promptfoo/commit/41cc82b2489efce4b167ebb25cc8cc6bcaf667b9
    i
    g
    • 3
    • 6
  • How to integrate promptfoo with our current prompt package structure
    n

    Nate

    06/18/2025, 3:09 PM
    We currently organize our prompts into directories mainly centered around an "invocation" function which is responsible for 1. filling in prompt template variables, 2. calling llm, 3. doing some minor post-processing on the response. I am currently thinking that the way to integrate would be to create a custom provider which takes in the name of one of these prompt directories as the "prompt" and invokes it with the vars given, would this be an advisable way to structure things? It feels like it may be a bit of a hack. Also open to modifying current structure as we already have prompts in a template format that is compatible with promptfoo, but it would be nice to include our post-processing logic in our evals without duplicating code. Also if relevant, working in typescript
    i
    • 2
    • 1
  • Help regarding getting Base64 Image final prompt and which strategy triggered it
    u

    _wutato_

    06/19/2025, 7:24 AM
    Hi! I am performing testing on an LLM in dev env with the following:
    Copy code
    plugins:
     - id: debug-access  # Tests for exposed debugging interfaces and commands
      strategies:
        - id: basic  # Original plugin tests without any additional strategies or optimizations
        - id: jailbreak:composite  # Combines multiple jailbreak techniques for enhanced effectiveness
        - id: jailbreak:likert  # Uses Likert scale-based prompts to bypass content filters
        - id: jailbreak  # Single-shot optimization of safety bypass techniques
    ~~Saw a "Fail" case that made use of Base64 encoded image as shown in the uploaded picture. Would like to check if it is possible to get the final prompt sent to the LLM or how this prompt is created. Also unsure if the prompt was generated under DebugAccess or by a strategy. Any help or advice on where to check is greatly appreciated. Thank you!~~ *Edit: * Realised the UI interprets the prompt as a base64 image because of the following line in the ResultsTable.tsx
    Copy code
    if (
           typeof value === 'string' &&
            (value.match(/^data:(image\/[a-z]+|application\/octet-stream);base64,/) ||
             value.match(/^\/[0-9A-Za-z+/]{4}.*/))
    )
    Still unsure about the actual prompt being sent out, as manually using the prompt listed in the image gets a PASS response. https://cdn.discordapp.com/attachments/1385158186046324829/1385158186494984294/base64_fail.PNG?ex=68565e16&is=68550c96&hm=970229f01b0872c18132009b9e4e8014b90515de8f2aaa8fb687b76892458fb4&
    i
    • 2
    • 4
  • How to test the output returned by an MCP Tool?
    d

    Dan

    06/20/2025, 8:47 AM
    Hello! I have a local MCP server developed using FastMCP. What I want is to check if a user prompt is causing the execution of a tool and then I want to do some asserts on the output of the MCP tool call. How can I do that locally?
  • Anyone knows what TypeError: Cannot read properties of undefined (reading 'startsWith') means?
    a

    Alba

    06/20/2025, 10:43 AM
    Hi! I was just using promptfoo for the first time and this message popped up in all my evaluation. I've checked my Python and there is no 'startsWith', so don't know what it means. The only thing I've got in my evals is ERROR but not explained. Any idea? Thanks!
    i
    • 2
    • 2
  • Export red teaming report PDF
    m

    mjt2007

    06/20/2025, 6:21 PM
    Is it possible to export the red teaming results PDF from the CLI? Currently the promptfoo redteam report only starts the web server browser UI. I don’t see any other options in the command line documentation. Thanks
    i
    • 2
    • 1
  • Eval variable columns not displayed in self-hosted
    g

    GuillermoB

    06/20/2025, 10:46 PM
    We moved from local evals to remove evals. We run the evals in local and upload them with --share. Works fine, however the variable columns are not displayed in the self-hosted remote promptfoo web ui. What may be going on? https://cdn.discordapp.com/attachments/1385752733587603476/1385752734279536832/image.png?ex=6857364d&is=6855e4cd&hm=24a7e9c2b657b9c8cf40a9572f70d593ab8493134b8ab3179a4d1f90f407646a& https://cdn.discordapp.com/attachments/1385752733587603476/1385752735302815844/image.png?ex=6857364e&is=6855e4ce&hm=ffcaaace2cfa2f4ce41282f61b46c78516c507100f3339727ee9947044dac0ae&
    s
    • 2
    • 7
  • How to avoid Template render error without affecting prompt (Python code generates tests & prompt)?
    m

    marco.sbodio

    06/22/2025, 12:35 PM
    Hello, I have documented my problem in this github issue: https://github.com/promptfoo/promptfoo/issues/4538. Any help is highly appreciated! 🙂 Thank you!
    • 1
    • 1
  • TokenUsage in openAI returns 0
    a

    ahmedelbaqary.

    06/24/2025, 1:55 PM
    When I send an evaluation request, the test is done correctly, but when looking into the tokenUsage for prompt and completion it gives 0 but when I use Anthrobic models, these values has numbers Do anyone knows why this happens?! https://cdn.discordapp.com/attachments/1387068500501069965/1387068500610125944/image.png?ex=685bffb5&is=685aae35&hm=f748fdf7f49d957a04af47dc566ee25f884c591f3f681c984afcc724e4dcc426&
  • OpenAI Responses previous_response_id
    g

    Greg

    06/26/2025, 11:02 PM
    Hi all, I'm trying to leverage 2 parameters of the OpenAI Responses API: prompt and previous_response_id. I have the code below that works but unfortunately then the output of all my tests is just the response ID instead of being the OpenAI text response. Is there an option to save the response ID while still having the test return the text? Here's the relevant parts of our setup: ``` prompts: - "dummy" providers: - id: https config: url: https://api.openai.com/v1/responses method: POST headers: Authorization: "Bearer {{ env.OPENAI_API_KEY }}" body: | { "model": "gpt-4.1-nano-2025-04-14", "input": "{{ tenant_msg }}", "prompt": { "id": "{{ promptId }}" }, {% if previous_response_id %} "previous_response_id": "{{ previous_response_id }}", {% endif %} "store": true } transformResponse: | ({ output: { text: json.output[0].content[0].text, response_id: json.id } }) defaultTest: vars: previous_response_id: "" tests: - description: "1 • tenant contacts" vars: { tenant_msg: "Good morning.", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id - description: "2 • tenant follow-up" vars: { tenant_msg: "Did you get my previous message?", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id
  • browser@mac show empty in recent release for `promptfoo view -y`
    n

    no1care

    06/27/2025, 6:56 AM
    Mac air, promptfoo (0.115.4), chrome,firefox all has issues. The error in developer tool is > Uncaught SyntaxError: expected expression, got '<' Did i mess up some configuration?