promptfoo #questions

What data is sent to PromptFoo's API during remote generation for Red Teaming?

JohnRoy

05/23/2025, 12:36 PM

I understand that PromptFoo uses their own fine-tuned models for attack generation, which are accessible only through PromptFoo's API. Some plugins (like harmful) requires this models even when setting PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION=true. For these cases, I assume that the whole config file is sent to PromptFoo's server, exposing, for example, the "purpose" section. My question is then particularly about the jailbreak strategy. On the documentation jailbreak is described as following: 1) Starting with a base prompt that attempts to elicit undesired behavior 2) Using an LLM-as-a-Judge to: 2.1) Analyze the AI's response 2.2) Track the conversation history 2.3) Generate increasingly refined prompts based on previous attempts 4) Repeating this process for a configurable number of iterations 5) Selecting the most effective prompt variation discovered Is the generation of point 2.3) taking place again in PromptFoo's servers? I.e., are all the answers of my LLM being "exposed"? Also is it possible to completely replace PromptFoo's unaligned model for a local unaligned one? Thank you

Is it possible to red teamin with node package

roy

05/25/2025, 12:33 AM

Is it possible to red teamin with node package

Share Feature seems to be not working

pyq

05/29/2025, 7:34 AM

I used the command promptfoo share to upload the evaluation results to www.promptfoo.app. However, when I create a shareable link and try to share it with others, the link doesn't work—regardless of whether it's set to public or not. this is a generated sample link: https://www.promptfoo.app/register?invite_code=8d8650f5-8717-428c-a0d3-6a690ae39ed5&eval_id=eval-J77-2025-05-29T07:23:34 https://cdn.discordapp.com/attachments/1377550615995093122/1377550616175579146/Screenshot_2025-05-29_at_3.33.20_PM.png?ex=68395f7c&is=68380dfc&hm=7873b1a815939c905a611c0f20163dd51458407d4c8251eec974677c8f8e61b3&

How do I pass the videos as a multi-model input to the Google's Gemini flash models?

raxrb

05/29/2025, 5:56 PM

Can anybody share me the example to pass multi-modal data like video, images, or audio to the Gemini Flash model?

enterprise

vibecoder

05/31/2025, 5:54 AM

How is enterprise self-host different from teams taking the github version and deploying it on a kubernetes cluster? Also, what is the pricing - it does not list it on the site.

Disabling telemetry

vibecoder

05/31/2025, 5:56 AM

Currently, even after setting the environment variable, there is a single message per eval that gets sent to promptfoo server stating "telemetry is disabled". Can we update the code to not do so if the variable is set to not use telemetry?

client generated sessionId

grj373

06/02/2025, 2:21 PM

Hi, using promptfoo I have setup a redteam scan and want to use client generated session ID. However, when selecting this option and placing {{sessionId}} in the body of the request promptfoo sends an empty string for the sessionId. Does anyone know of a known issue with this functionality? Thanks

Allowing file upload for Provider on web eval

vibecoder

06/05/2025, 2:05 AM

Promptfoo allows uploading of tests and YAML config files from web eval page. These get stored in a persistent storage as well. Why not allow the same capability for provider file? If there are more folks looking for similar capability, I can work on submitting a pull request. But before I do so, I am interested in knowing why it is not allowed in the first place.

Asserting on complex nested extracted entities

Donato Azevedo

06/05/2025, 1:52 PM

Hi everyone! I'm humbly looking for suggestions and opinions on how to assert on complex, sometimes deeply nested, extracted entities. My task extracts several entities from pdf documents and I am already using promptfoo to assert and build up metrics for performance. But it's getting ever so hard because of the complexity of the extracted entities. For example, this is one of my assertions:

Copy code

- type: python
      value: ('Outorgar Poderes' in output['pessoas_analisadas'][1]['restricoes_valor'] and '12' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'] and 'meses' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'])
      metric: alcada

And this is not even robust, becase it depends on the order of the

output['pessoas_analisadas']

list being consistent across different evals. I'd appreciate any sugestion. Meanwhile, I was even considering contributing a

transform

property to assert-sets, which would enable this kind of syntax:

Copy code

tests:
  - description: test for persona 1
    vars:
      - file://path/to/pdf
    assert:
      - type: assert-set
        transform: next(o for o in output['pessoas_analisadas'] if o['nome'] == 'NAME OF PERSON')
        assert:
          - type: python
            value: ('Outorgar Poderes' in output['restricoes_valor'] and '12' in output['restricoes_valor']['Outorgar Poderes']['valor_alcada'] ...

Opinions?

Metrics defined via named_scores in python file assertion not showing in UI

Donato Azevedo

06/05/2025, 5:05 PM

I am hesitant to open a bug issue in the github repo simply because this issue (https://github.com/promptfoo/promptfoo/issues/1626) mentions there has a been a fix last year. However, I am trying the same return from the OP of the issue and not getting any metrics shown in the UI:

Copy code

python
def get_assert(output: dict[str, any], context) -> bool | float | GradingResult:
    return {
      'pass': True,
      'score': 0.11,
      'reason': 'Looks good to me',
      'named_scores': {
         'answer_similarity': 0.12,
         'answer_correctness': 0.13,
         'answer_relevancy': 0.14,
      }
    }

I was expecting to see the three

answer_*

named metrics appearing up top https://cdn.discordapp.com/attachments/1380231112252723332/1380231112629944341/Screenshot_2025-06-05_at_14.04.57.png?ex=68431fe4&is=6841ce64&hm=1e59574abdb29e00278b719905d74efcea8620d330947203cbe31d0c4fc9301b&

JSON errors during prompt generation

Bryson

06/06/2025, 7:04 PM

When I'm attempting to generate my redteam prompts (promptfoo redteam generate), I keep consistently running into "SyntaxError" issues in JSON, seemingly at inconsistent points during generation. I've tried multiple times and keep running into the same error:

Copy code

[chat.js:161]     completions API response: {"id":"chatcmpl-BfWK1RSvr3LAckI7hoUHM9dYo0Zce","object":"chat.completion","created":1749235393,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"message":{}
<anonymous_script>:430


SyntaxError: Expected ',' or '}' after property value in JSON at position 1966 (line 430 column 1)
    at JSON.parse (<anonymous>)
    at encodeMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:95:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async addMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:122:33)
    at async action (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/index.js:195:34)
    at async applyStrategies (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:241:35)
    at async synthesize (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:678:85)
    at async doGenerateRedteam (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/commands/generate.js:243:88)

Is this a Promptfoo bug by chance? Or is it possible I'm doing something wrong? Happy to DM over my promptfooconfig.yaml if helpful

Does promptfoo execute MCP tool calls?

sasha

06/09/2025, 7:42 PM

I’m wondering if promptfoo executes a MCP models tool calls, when it’s requested by the model or is that only possible when function callbacks are implemented?

Which provider/model is used for simulated_user?

vibecoder

06/10/2025, 6:09 AM

For the functionality listed here: https://www.promptfoo.dev/docs/providers/simulated-user/, which model/provider generates the response on behalf of the user? How can we provide configuration for the model to use for response generation?

is latency assertion via Google Sheets broken? (Image attached)

anurag

06/10/2025, 3:30 PM

No matter what I enter in the latency, I always get the value of

0.75ms

https://cdn.discordapp.com/attachments/1382019030801584148/1382019031208427571/image.png?ex=6849a105&is=68484f85&hm=180be1f47d77da7022deef500bd41f20d9b2cbcdb7f1c54ce8a3803f5bb334eb& https://cdn.discordapp.com/attachments/1382019030801584148/1382019031539650581/image.png?ex=6849a105&is=68484f85&hm=d91040592a9966b5ff36fab22c0cfa37f1b6662c568578d534d0de07ab0040fb&

How to print messages to stdout without needing --verbose?

Donato Azevedo

06/10/2025, 6:40 PM

I have a use case where I perform a long action on the

beforeAll

extension hook. I wanted to print some info to stdout about the progress, without needing to pass

--verbose

when running to view it. How can I do it?

Speech / quotation marks

grj373

06/11/2025, 4:14 PM

Hello, is there any way to double escape speech marks? ... or to prevent prompts from containing speech marks?

Can websocket use headers config for authorization?

Bronwyn

06/12/2025, 3:49 AM

Hi I want to ask if promptfoo already support headers in websocket because I'm getting Websocket error: {} when trying to connect to the websocket. It connects fine when connecting with wscat via terminal My promptfooconfig.yaml providers: - id: 'wss://websocket.example.url' config: headers: Authorization: Bearer messageTemplate: '{"action": "sendMessage", "content": "{{prompt}}"}' transformResponse: 'data.content' timeoutMs: 20000 Thank you

redteam provider ignored?

phillprice

06/13/2025, 11:00 AM

Hello I'm trying to setup red-team but generating the prompts with either an azure open ai model or a vertex gemini model. Both of them say they're picking up the the provider but the prompt generation still appears proxied in promptfooconfig.yaml

Copy code

redteam:
  provider: vertex:gemini-2.0-flash

I would expect to see vertex prompts not the https://api.promptfoo.app/api/ in logs or am I missing something https://cdn.discordapp.com/attachments/1383038229430800417/1383038229657423903/message.txt?ex=684d5639&is=684c04b9&hm=e886247025063d28d786293cbbb0f5896f90d0fef84507c57f7a63f9794547fa&

Python var loader not working in GitHub Actions CI

sreven12

06/16/2025, 3:13 AM

Hi! I'm running into an issue while using promptfoo in a GitHub Action CI pipeline triggered by a Pull Request. Here's the setup: In my PR, I have a test config file like this: prompts: - '{{query}}' providers: - python:rag_provider.py:call_api defaultTest: assert: - type: answer-relevance threshold: 0.7 - type: context-relevance threshold: 0.7 vars: context: python:rag_context_loader.py:get_var tests: - question.csv The intention is for the context variable to be dynamically loaded by executing the Python function get_var() inside rag_context_loader.py. However, during the GitHub Action run (CI triggered by the PR), promptfoo does not seem to execute the Python function. Instead, it sets the context to the raw string "python:rag_context_loader.py:get_var".

Filtering tests/scenarios

straygar

06/18/2025, 12:07 PM

Heya. Tldr: how do you write, debug and run parts of your tests & scenarios? I have a pretty big

promptfooconfig.yaml

file with different scenarios etc. to catch regressions in CI and evaluate potential prompts & models. I use a custom Python provider & some custom python assertions, along with built in ones. My current process of authoring new tests is: - add new test - comment out most of the file - if something goes wrong, add print statements, rinse and repeat Obviously this is not the best. I was never able to get

--filter-pattern

--filter-metadata

to work, and the entire

promptfooconfig

file is always run. I just found a way to potentially attach a python debugger, but it's a bit rough: https://github.com/promptfoo/promptfoo/commit/41cc82b2489efce4b167ebb25cc8cc6bcaf667b9

How to integrate promptfoo with our current prompt package structure

Nate

06/18/2025, 3:09 PM

We currently organize our prompts into directories mainly centered around an "invocation" function which is responsible for 1. filling in prompt template variables, 2. calling llm, 3. doing some minor post-processing on the response. I am currently thinking that the way to integrate would be to create a custom provider which takes in the name of one of these prompt directories as the "prompt" and invokes it with the vars given, would this be an advisable way to structure things? It feels like it may be a bit of a hack. Also open to modifying current structure as we already have prompts in a template format that is compatible with promptfoo, but it would be nice to include our post-processing logic in our evals without duplicating code. Also if relevant, working in typescript

Help regarding getting Base64 Image final prompt and which strategy triggered it

_wutato_

06/19/2025, 7:24 AM

Hi! I am performing testing on an LLM in dev env with the following:

Copy code

plugins:
 - id: debug-access  # Tests for exposed debugging interfaces and commands
  strategies:
    - id: basic  # Original plugin tests without any additional strategies or optimizations
    - id: jailbreak:composite  # Combines multiple jailbreak techniques for enhanced effectiveness
    - id: jailbreak:likert  # Uses Likert scale-based prompts to bypass content filters
    - id: jailbreak  # Single-shot optimization of safety bypass techniques

~~Saw a "Fail" case that made use of Base64 encoded image as shown in the uploaded picture. Would like to check if it is possible to get the final prompt sent to the LLM or how this prompt is created. Also unsure if the prompt was generated under DebugAccess or by a strategy. Any help or advice on where to check is greatly appreciated. Thank you!~~ *Edit: * Realised the UI interprets the prompt as a base64 image because of the following line in the ResultsTable.tsx

Copy code

if (
       typeof value === 'string' &&
        (value.match(/^data:(image\/[a-z]+|application\/octet-stream);base64,/) ||
         value.match(/^\/[0-9A-Za-z+/]{4}.*/))
)

Still unsure about the actual prompt being sent out, as manually using the prompt listed in the image gets a PASS response. https://cdn.discordapp.com/attachments/1385158186046324829/1385158186494984294/base64_fail.PNG?ex=68565e16&is=68550c96&hm=970229f01b0872c18132009b9e4e8014b90515de8f2aaa8fb687b76892458fb4&

How to test the output returned by an MCP Tool?

Dan

06/20/2025, 8:47 AM

Hello! I have a local MCP server developed using FastMCP. What I want is to check if a user prompt is causing the execution of a tool and then I want to do some asserts on the output of the MCP tool call. How can I do that locally?

Anyone knows what TypeError: Cannot read properties of undefined (reading 'startsWith') means?

Alba

06/20/2025, 10:43 AM

Hi! I was just using promptfoo for the first time and this message popped up in all my evaluation. I've checked my Python and there is no 'startsWith', so don't know what it means. The only thing I've got in my evals is ERROR but not explained. Any idea? Thanks!

Export red teaming report PDF

mjt2007

06/20/2025, 6:21 PM

Is it possible to export the red teaming results PDF from the CLI? Currently the promptfoo redteam report only starts the web server browser UI. I don’t see any other options in the command line documentation. Thanks

Eval variable columns not displayed in self-hosted

GuillermoB

06/20/2025, 10:46 PM

We moved from local evals to remove evals. We run the evals in local and upload them with --share. Works fine, however the variable columns are not displayed in the self-hosted remote promptfoo web ui. What may be going on? https://cdn.discordapp.com/attachments/1385752733587603476/1385752734279536832/image.png?ex=6857364d&is=6855e4cd&hm=24a7e9c2b657b9c8cf40a9572f70d593ab8493134b8ab3179a4d1f90f407646a& https://cdn.discordapp.com/attachments/1385752733587603476/1385752735302815844/image.png?ex=6857364e&is=6855e4ce&hm=ffcaaace2cfa2f4ce41282f61b46c78516c507100f3339727ee9947044dac0ae&

How to avoid Template render error without affecting prompt (Python code generates tests & prompt)?

marco.sbodio

06/22/2025, 12:35 PM

Hello, I have documented my problem in this github issue: https://github.com/promptfoo/promptfoo/issues/4538. Any help is highly appreciated! 🙂 Thank you!

TokenUsage in openAI returns 0

ahmedelbaqary.

06/24/2025, 1:55 PM

When I send an evaluation request, the test is done correctly, but when looking into the tokenUsage for prompt and completion it gives 0 but when I use Anthrobic models, these values has numbers Do anyone knows why this happens?! https://cdn.discordapp.com/attachments/1387068500501069965/1387068500610125944/image.png?ex=685bffb5&is=685aae35&hm=f748fdf7f49d957a04af47dc566ee25f884c591f3f681c984afcc724e4dcc426&

OpenAI Responses previous_response_id

Greg

06/26/2025, 11:02 PM

Hi all, I'm trying to leverage 2 parameters of the OpenAI Responses API: prompt and previous_response_id. I have the code below that works but unfortunately then the output of all my tests is just the response ID instead of being the OpenAI text response. Is there an option to save the response ID while still having the test return the text? Here's the relevant parts of our setup: ``` prompts: - "dummy" providers: - id: https config: url: https://api.openai.com/v1/responses method: POST headers: Authorization: "Bearer {{ env.OPENAI_API_KEY }}" body: | { "model": "gpt-4.1-nano-2025-04-14", "input": "{{ tenant_msg }}", "prompt": { "id": "{{ promptId }}" }, {% if previous_response_id %} "previous_response_id": "{{ previous_response_id }}", {% endif %} "store": true } transformResponse: | ({ output: { text: json.output[0].content[0].text, response_id: json.id } }) defaultTest: vars: previous_response_id: "" tests: - description: "1 • tenant contacts" vars: { tenant_msg: "Good morning.", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id - description: "2 • tenant follow-up" vars: { tenant_msg: "Did you get my previous message?", promptId: "pmpt_685a89e16a208196bb088d0b09cbee3f09404e8cfa6d680c" } options: transform: output.response_id storeOutputAs: previous_response_id

browser@mac show empty in recent release for `promptfoo view -y`

no1care

06/27/2025, 6:56 AM

Mac air, promptfoo (0.115.4), chrome,firefox all has issues. The error in developer tool is > Uncaught SyntaxError: expected expression, got '<' Did i mess up some configuration?