https://promptfoo.dev/ logo
Join Discord
Powered by
# questions
  • Allow Azure provider to specify a function for setting the bearer token
    d

    dreamspider42

    03/19/2025, 11:41 PM
    I've tried to work around this a number of different ways but ultimately I don't think the loadApiProvider issue I was encountering below would even be necessary if I could just specify a function of my choosing for obtaining the bearer token for the Azure OpenAI request. The clientID + secret stuff is nice, but implementations are going to vary for some of this stuff in a deployed environment. Ours for instance leverages Workload Identity Federation between our Azure account and AWS so its not a simple client id and secret and keys aren't allowed for us either. We have a function that grabs the token that works successfully with our Azure endpoint but I don't have a great way to plug it into our deployed environment. The way I'd imagine this could work is if the authFn param is provided it circumvents the other logic and uses that function to provide the bearer token to the resource.
    s
    m
    • 3
    • 5
  • Claude 3.7 "streaming is strongly encouraged" error
    c

    CasetextJake

    03/25/2025, 8:43 PM
    I'm trying to run tests using Claude 3.7, and I get this error for each run: " API call error: Streaming is strongly recommended for operations that may take longer than 10 minutes. See https://github.com/anthropics/anthropic-sdk-python#streaming-responses for more details" This doesn't happen with 3.5 Here's my setup -- maybe I'm doing it wrong?
    Copy code
    - id: anthropic:messages:claude-3-7-sonnet-20250219
        label: anthropic-3.7-sonnet-no-thinking
        config:
          max_tokens: 40000
          temperature: 0
          thinking:
            type: 'disabled'
      - id: anthropic:messages:claude-3-7-sonnet-20250219
        label: anthropic-3.7-sonnet-thinking
        config:
          temperature: 0
          max_tokens: 40000
          thinking:
            type: 'enabled'
            budget_tokens: 32000
    m
    • 2
    • 3
  • Does promptfoo have ollama function-calling support?
    h

    harpomaxx

    03/25/2025, 10:17 PM
    The current version of Ollama has implemented some function-calling support. I just want to test whether a particular Ollama model correctly supports function calling. First, I want to know if the LLM correctly identifies when to call a function (and with what arguments). I'm not sure if this is supported in the current version of Promptfoo.
    m
    • 2
    • 1
  • Map test case variables to corresponding values
    g

    Gia Duc

    03/30/2025, 2:09 PM
    I have a test file in JSON, in the variable "weekdays", I want to map each day with each corresponding value(1 to 7) for the {{day_index}} in the assert value. For example, if the variable is "tuesday", the asserted value will be "interval=1w download=true | tuesday=count(_time[day]=2) | top(file_type)" Could anyone please take a look. Thank you. Here is the test:
    Copy code
    [
      {
          “description”:“Query downloaded files on specific day?“,
          “vars”:{
             “prompt”:[
                “How many files were downloaded on {{weekdays}}?“,
                “Show me the file types downloaded every {{weekdays}}.”
             ],
             “weekdays”:[
                “sunday”,
                “monday”,
                “tuesday”,
                “wednesday”,
                “thursday”,
                “friday”,
                “saturday”
             ]
          },
          “assert”:[
             {
                “type”:“contains-any”,
                “value”:[
                   “interval=1w download=true | {{weekdays}}=count(_time[day]={{day_index}}) | top(file_type)”
                ]
             }
          ]
       }
    ]
    m
    • 2
    • 4
  • About Prompt Generation?
    s

    subzer0

    04/01/2025, 11:45 AM
    Which API does it use to generate the prompts using promptfoo? Does it use another AI model to generate prompt and attack the targets?
    m
    • 2
    • 1
  • Promptfoo on GCR
    p

    patsu

    04/02/2025, 3:53 PM
    Hello everyone, have you tried self hosting promptfoo on gcr(google cloud run)? Can you share a guide?
    s
    m
    • 3
    • 5
  • custom guardrails testing
    b

    b00l_

    04/02/2025, 10:55 PM
    I have a custom guardrails to test, it is an API endpoint that receives a prompt and returns a json, so I wrote a custom provider to send the prompt and get a json like
    { "flagged": true/false, "category": "something" }
    , the config looks like
    Copy code
    targets:
    - id: 'file://custom_guard.py'
        config:
          endpoint: '{{env.ENDPOINT}}'
          key: '{{env.TOKEN}}'
    
    redteam:
      plugins: ...
    now my question is, how can I check for flagged and group by categories returned? thnaks
    m
    • 2
    • 4
  • Red teaming stuck
    p

    patsu

    04/03/2025, 3:30 PM
    Hello! I have a problem when i do redteaming. For context i am running the project locally now. It runs fine but when i do the red team test i am always stuck in running scan. I inspected logs and its asking for an email when i provide the email nothing happens after. This is also the issue i guess in my gcp compute engine
    m
    i
    • 3
    • 2
  • Testing Mistral-Ollama via Open AI API (openwebUI)
    s

    Sunny25

    04/05/2025, 11:01 AM
    I am trying to perform a red team evaluation against mistral running locally on Ollama. Since i have Open Web UI installed i am utilizing the Open AI API provided by Open Web UI. I am able to successfully make a curl request through this API. However , while setting up a custom target with attached YAML, it gives me model not found error. Curl request, YAML config and Promptfoo error attached.. https://cdn.discordapp.com/attachments/1358033855214518384/1358033855457661178/image.png?ex=67f25f16&is=67f10d96&hm=05041d85b7826735091f294b4239565996cba634ad666d3f68c53b2f1c92e83a& https://cdn.discordapp.com/attachments/1358033855214518384/1358033855986008265/image.png?ex=67f25f16&is=67f10d96&hm=6dacc8d053d81540bb4c8a47c3df29f5b1c56df2871e62239b7721d0c43e1ab5& https://cdn.discordapp.com/attachments/1358033855214518384/1358033856321683486/image.png?ex=67f25f16&is=67f10d96&hm=6a25c2777fbf936c588faf5c44f504e0ab120dd752d3f5e6832b2ee89f38acab&
    s
    i
    • 3
    • 7
  • Request Transform for Prompt Injection Strategy
    b

    blue.yeen

    04/05/2025, 7:48 PM
    I am attempting to run a Redteam test using Direct Prompt Injection strategy however it seems the prompts that get generated are causing issues when being passed in the JSON body for our chatbot's API set up. I am trying create a request transform so the prompt won't cause issues in the JSON body. I thought this simple Javascript would do the trick (prompt) => JSON.stringify(prompt) but it when I test the target I get an error invalid input.
    i
    • 2
    • 4
  • vulnerability in `request` package used by promptfoo
    r

    Rohit Jalisatgi - Palosade

    04/07/2025, 10:56 PM
    Please see reported vulnerability in "Request" package that is used by promptfoo - https://github.com/advisories/GHSA-p8p7-x288-28g6. This package is no longer maintained by the owner. Can we not use this package in promptfoo?
    i
    • 2
    • 3
  • Redteam Testing Stuck on Running Scan...
    b

    blue.yeen

    04/10/2025, 6:25 PM
    I've deployed promptfoo using Render and it seems evaluations run fine however when attempting our first Redteam test it seems to be stuck indefinitely on 'Running scan...' I'm only generating 10 tests so I feel like it shouldn't be a memory issue. Has anyone else run into this or is aware of what may be the cause? https://cdn.discordapp.com/attachments/1359957546080932113/1359957546483318784/image.png?ex=67f95ea9&is=67f80d29&hm=df43694fd6bc72e89b2af8987cbb31321011f5a3e99b827385cf49e8397ef81e&
    m
    i
    • 3
    • 2
  • Running any Eval in web results in Syntax Error <something> is not valid JSON
    e

    ericchaves

    04/11/2025, 12:46 AM
    Hi folks, I'm new to promptfoo and I'm trying it out for the first time, but instead of running the cli/quick-start I went straigth to the web server. I configured a docker-compose running the latest image (version 0.109.1) and then I tried some basic examples from the quick start and from the examples repository folder. All Evals attemps raises a similar error message: > SyntaxError: Unexpected token 'I', "I don't ha"... is not valid JSON > > SyntaxError: Unexpected token 'I', "I don't ha"... is not valid JSON > at JSON.parse () > at eval (eval at getInlineTransformFunction (/app/src/util/transform.ts:97:10), :3:13) > at transform (/app/src/util/transform.ts:159:37) > at runAssertion (/app/src/assertions/index.ts:131:14) > at /app/src/assertions/index.ts:363:22 I have tried with google and openrouter providers and results are always the same. Given the errors output it seems that the Evals are calling the providers API, but the response does not seems to be a valid JSON. Instead it seems that the web server is trying to parse the actual answer (the LLM response) created by the provider. Some examples are: > SyntaxError: Unexpected token 'I', "I don't ha"... is not valid JSON > SyntaxError: Unexpected token 'I', "I'm sorry,"... is not valid JSON > SyntaxError: Unexpected token 'H', "Here are a"... is not valid JSON > SyntaxError: Unexpected token 'A', "Ahoy, mate"... is not valid JSON Am I missing some configuration? Any ideas on what I'm doing wrong? https://cdn.discordapp.com/attachments/1360053293484605622/1360053293673484338/image.png?ex=67f9b7d5&is=67f86655&hm=278e125e99fd9bd76ed6c891f1d64097b7f572af7029ab0ee20ab06c24efb6a7&
    m
    • 2
    • 5
  • output csv file doesn't contain vars
    s

    SeanYang15

    04/15/2025, 9:23 PM
    According to documentation https://www.promptfoo.dev/docs/configuration/parameters/#output-file output file should contain test variables, but when I do
    promptfoo eval --output eval-result.csv
    I only get the result column. Did I miss something or is this a bug?
    j
    • 2
    • 1
  • Metrics without assertion
    v

    varunmehra

    04/16/2025, 7:08 AM
    Hi... I am very new to promptfoo, so if the question sounds stupid please be nice. I want to use promptfoo to evaluate an internal fine tuned model. We have a custom evaluation strategy that assigns 2 scores to every response from the model under test. The scores can be individually aggregated across all test cases. I want to integrate this custom eval with prompt foo. I don't want to assert whether something will pass or fail. Is promptfoo the right tool for this?
    i
    • 2
    • 2
  • LLM Red Teamin Guides
    d

    Dr.Scorpion

    04/16/2025, 11:42 AM
    How to list all the redteaming malicious prompts, all techniques, all strategies and every description or guide? ive started as a red teamer for LLM and i need to learn it
    i
    • 2
    • 1
  • Equality assertion for JSON fails for identical JSON objects
    s

    SeanYang15

    04/16/2025, 11:33 PM
    I've build a ~50 test cases using Equality assertions of JSON objects. Tests are wrote in CSV files. For some reason about 10% of the test cases fails although the result and expected JSON objects look identical to me and even after I literally copy-pasted the result as expected and running the eval again it's still failing. This is really annoying because I can trust the ovall success rate. Any idea why this is happening? Below is an example: [FAIL] Expected output "{ "symbol": "TSLA", "action": "SELL", "fraction": 1, "price": 269 }" to equal "{ "symbol": "TSLA", "action": "SELL", "fraction": 1, "price": 269 }" --- { "symbol": "TSLA", "action": "SELL", "fraction": 1, "price": 269 }
    j
    • 2
    • 1
  • Stuck on "Oops! You're lost!" in the promptfoo.app
    a

    adanceofshells

    04/22/2025, 7:04 AM
    Hi all, I ran into a problem while trying to generate credentials for sharing evals. I followed the instructions to go to promptfoo.app and set up an account. The verification email took some time to arrive, so I requested another one. When the email finally arrived, I entered the verification code, but it was invalid. Shortly afterward, I received another email with a different code. I had no way to repeat the procedure, and my account now only shows the "Oops! You're lost! You've wandered into an unknown part of our website." screen. I can only log out; every other action takes me to the same place. Thinking it was my fault for not waiting patiently for the queued verification codes, I tried again with another email address. This time, although I received only one code and entered it correctly, I still ended up on the same "You're lost" screen. Now I'm stuck with two accounts, set up with two different email addresses, both 100% unresponsive except for login/logout, and I still have no option to share my evaluations. Can you please guide me on what to do next? https://cdn.discordapp.com/attachments/1364134741841809408/1364134742206844958/image.png?ex=680890fa&is=68073f7a&hm=093ce96f3f5b1980feafb3eb80f369e3eb7bd4d6e5d9f362e65c62545d4a8e98&
    s
    • 2
    • 5
  • Assertion that calls a scoring API
    s

    sidd

    04/22/2025, 9:40 PM
    Hi, new to promptfoo here. The platform looks great btw :). Had a question about the best way to add an assertion integration. I'm looking to integrate a LLM-as-a-judge alternative my team has been working on- called the "pi scorer". We'd like to include it as an assertion in promptfoo. This scorer uses an encoder head to provide faster and deterministic scores. Like an LLM, it can handle any text subject and criteria. You can learn more here: https://docs.withpi.ai https://build.withpi.ai We were wondering if we could add an assertion, rather than integrating with the existing python assertion - so that people can discover this alternative scoring method. A few questions; 1. Can we add this as an official assertion, so that people don't need to write the python code to call this scorer? 2. This assertion requires using an API key. How would testing work in this case? Should we mock calls to our api so that the api key isn't needed during tests? Thanks for your time. Looking forward to hearing from y'all.
  • Scaling self-hosted Promptfoo in Kubernetes (K8s)
    y

    Yurii

    04/23/2025, 9:27 AM
    Hi everyone, We're currently experimenting with self-hosting Promptfoo in a Kubernetes (K8s) environment. However, we're facing issues when trying to scale the deployment to more than one replica. The problem arises because the /api/job/:id endpoint and a few others do not function correctly in a multi-replica setup. It seems that running jobs are stored in memory using: export const evalJobs = new Map(); This makes it instance-specific, so other replicas don't have access to the job state. Is it possible to scale Promptfoo horizontally in Kubernetes? If so, what strategies or workarounds would you recommend to support multiple replicas? Any advice or examples would be greatly appreciated! Thanks 🙂
    w
    i
    • 3
    • 2
  • OpenAI Compatible API name problems
    c

    cybr_knght

    04/23/2025, 7:45 PM
    I am trying to run a redteam scan against an openAI compatible model running in a custom Open WebUI application. At this point I am running the plugins on the "minimal test" preset plugin configuration. Here is the configuration file:
    Copy code
    yaml
    # yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
    description: AI Framework - llama32
    providers: openai:"NAME: llama3.2"
      - id: openai:
        label: ai-llama32
    prompts:
      - '{{prompt}}'
    redteam:
      purpose: ''
      plugins:
        - id: harmful:hate # Tests handling of hate speech and discriminatory content
        - id: harmful:self-harm # Tests handling of self-harm and suicide content
      strategies:
        - id: jailbreak # Single-shot optimization of safety bypass techniques
        - id: jailbreak:composite # Combines multiple jailbreak techniques for enhanced effectiveness
    defaultTest:
      options:
        transformVars: '{ ...vars, sessionId: context.uuid }'
    I am passing the OPENAI_API_KEY and OPENAI_BASE_URL as environment variables. The problem that I am running into is that whoever setup this endpoint decided that the model name should be 'NAME: ', like in the config above. The colon in the model name seems to be the issue. I have tried escaping the colon, surrounding it with quotes, encoding the colon, but no matter what I get the following:
    Copy code
    shell
    [util.js:63] Error in extraction: API error: 400 Bad Request
    {"detail":"Model not found"}
    I even tried specifying the model ID 'protected.llama3.2', but it gives the same error. Any ideas or direction would be appreciated.
    i
    • 2
    • 17
  • Disable provider, programmatic testing only
    i

    IzAaX

    04/29/2025, 9:09 AM
    I'd like to be able to use promptfoo optionally as a library for running manual tests on LLM outputs which have already been produced. So I have a dataset of 10000s of LLM outputs and need to sample test their accuracy and output format, it's basically a multilabelling pipeline (the output from the LLM is an array of N values from a discrete list). Is this the right library for my use case? 🤔 we may use the ai as a judge elements in the future but we're just starting with the programmatic testing for the moment.
    • 1
    • 1
  • provide assertion feedback back to chat
    d

    dmitry.tunikov

    04/30/2025, 7:40 AM
    Hi everyone! Do you know if there is a way to provide feedback to an LLM after assertion? I'm using promptfoo for checking my text -> graphql pipeline I would like to validate generated query with python/js and provide an error back to LLM + regenerate the query I read the docs, but couldn't find anything suitable for that. Essentially, I'm trying to do smth similar to this: https://www.promptfoo.dev/docs/guides/text-to-sql-evaluation/ but this example also doesn't have any sql validation + regeneration with feedback
  • OpenAI Responses API with Structured Output
    t

    Tony

    04/30/2025, 1:22 PM
    Is the
    text_format
    parameter supported for OpenAI’s
    responses.parse()
    method in promptfoo? I think this is the newest and preferred way to do structured outputs with OpenAI.
  • ❗[Help Needed] RedTeam Plugin Error: config.intent Not Set
    k

    kira

    05/03/2025, 7:29 AM
    Hey folks, I’m running into an issue with the RedTeam plugin and could use some help. I'm getting the following error:
    Copy code
    Error running redteam: Error: Validation failed for plugin intent: Error: Invariant failed: Intent plugin requires `config.intent` to be set
    Has anyone faced this before? Any idea what config.intent needs to be set to, or where exactly this should be configured? 🤔 Appreciate any guidance 🙏
    i
    • 2
    • 5
  • "Source Text" in g-eval showing as "[object Object]"
    d

    davidfineunderstory

    05/05/2025, 4:30 PM
    A) I'm using javascript files as prompts B) when using g-eval with these prompts, this is what is appearing in the "Source Text" field for the g-eval prompt sent to the LLM:
    Copy code
    Source Text:
    [object Object],[object Object]
    How can I make sure my original prompt is properly displayed to the g-eval prompt?
    i
    • 2
    • 3
  • Response in promptfoo is truncated (Nova Pro)
    e

    ert

    05/05/2025, 11:51 PM
    I am using Nova Pro, and the JSON response that I'm expecting is always truncated. I already have max_tokens set to 300,000, and it's still cutting off the response. https://cdn.discordapp.com/attachments/1369099281512005813/1369099281675587737/image.png?ex=681aa091&is=68194f11&hm=5ebbfe1ac51c54237b43e291e4b8b4bdfa0ac2000fcc5609e9f6fe8b28eb5764&
    • 1
    • 1
  • Format the output.json to only include the LLM output
    e

    ert

    05/06/2025, 1:27 PM
    Based in this link https://www.promptfoo.dev/docs/configuration/parameters/#output-file, each record includes the original prompt, LLM output, and test variables. Is there a configuration that will only show the LLM output and nothing else?
  • Red Team Config Examples
    r

    Rob

    05/07/2025, 9:07 PM
    Hi, Love the idea of promptfoo! I'm trying to set up some simple examples of evaluating promptfoo for my organization. Are there example redteam configs? Like something for ChatGPT, OpenRouter, PortKey.ai, and a local chatbot server? Any tips are much appreciated!
    i
    • 2
    • 1
  • Prompts Sending
    a

    aldrich

    05/09/2025, 2:19 AM
    Hi folks, i'm running some sample tests against our premise LLM, using the combination of plugins and strategies, but i didnt want the promptfoo send the prompts generated by the original plugins and just wang to send the prompts which converted by the strageties. How can i do that? any suggestions would be appreicated!