BrianGenisio
08/11/2025, 7:30 PMpromptfoo view
I'd like to be able to control this as something other than 15500. Is there a good way?Waz
08/11/2025, 10:57 PMthe
08/12/2025, 11:45 AMgrj373
08/13/2025, 1:43 PMBryson
08/13/2025, 8:57 PMAWilborn
08/14/2025, 5:06 PMgrj373
08/18/2025, 9:12 AMJosema Blanco
08/18/2025, 12:21 PMCYH
08/18/2025, 8:44 PMDAK
08/18/2025, 9:52 PM--max-concurrency
. The command line reports it as running concurrently when it hasn't
> Duration: 1m 46s (concurrency: 4)
> Successes: 9
> Failures: 0
> Errors: 0
> Pass Rate: 100.00%
I noticed a comment in this issue https://github.com/promptfoo/promptfoo/issues/1280#issuecomment-2251765379 - "We recently refactored evaluations to do providers 1 at a time" and hoping this isn't a permanent loss of functionality. EDIT - (Just noticed the date on that from last year. Probably not related but I couldn't find any other relavant mention). Is there a way I can re-enable concurrent evals? I'm running against my own local server for testing my multi-agent service, the testing configuration allowed me to validate more complex agentic tasks. Maybe HTTP Provider is no longer the best way to handle that?Waz
08/19/2025, 6:01 PMerrors: [
{
code: 'invalid_type',
expected: 'string',
received: 'object',
path: [ 'url' ],
message: 'Expected string, received object'
}
]
Here's my provider:
yaml
providers:
- id: https
label: Base model
config:
url: {{ env.PROVIDER_URL }}
maxRetries: 3
method: POST
headers:
'Content-Type': 'application/json'
'Authorization': 'Bearer {{ env.GOOGLE_ID_TOKEN }}'
body:
agent:
query: '{{query}}'
transformResponse: |
{
output: json.finalMessageContent,
tokenUsage: {
total: json.tokenUsage?.totalTokens || 0,
prompt: json.tokenUsage?.inputTokens || 0,
completion: json.tokenUsage?.outputTokens || 0,
cached: json.tokenUsage?.cacheReadTokens || 0,
numRequests: json.tokenUsage?.llmCalls || 0
},
cost: json.cost
}
Waz
08/20/2025, 5:34 PMfile://path
does not work in my http provider
yaml
body:
query: '{{prompt}}'
date: '2025-06-03T22:01:13.797Z'
transactions: file://./test_data/transactions.csv
This works in the normal evals, but not when red teaming it seems?glutensnake
08/20/2025, 7:57 PMElias_M2M
08/21/2025, 1:46 PMJosema Blanco
08/21/2025, 3:18 PMSuraj
08/22/2025, 10:38 AMPuneet Arora
08/24/2025, 7:57 AMDonato Azevedo
08/25/2025, 8:03 PMjob_id,clause_number,text,legislation_id,expected_compliant,expected_analysis_rubric,__expected1,__expected2,__expected3
job123,2,"Customers have 3 days to return defective products",consumer_law,FALSE,The analysis must state that the minimum is 7 days.,javascript: JSON.parse(output).Evaluation.Status === 'OK',javascript: JSON.parse(output).Evaluation.Compliant === (context.vars.expected_compliant === 'TRUE'),llm-rubric: The analysis should satisfy the rubric: {{expected_analysis_rubric}}
But it's obviously failing when the __expected3
fails... How can I still run the rubric, but disregard its actual score?Elias_M2M
08/26/2025, 1:25 PMEnvy
08/26/2025, 3:49 PMpeter
08/26/2025, 10:59 PMdefaultTest:
options:
rubricPrompt: |
You are an expert factuality evaluator. Compare these two answers:
Question: {% raw %}{{input}}{% endraw %}
Reference answer: {% raw %}{{ideal}}{% endraw %}
Submitted answer: {% raw %}{{completion}}{% endraw %}
Determine if the submitted answer is factually consistent with the reference answer.
Choose one option:
A: Submitted answer is a subset of reference (fully consistent)
B: Submitted answer is a superset of reference (fully consistent)
C: Submitted answer contains same details as reference
D: Submitted answer disagrees with reference
E: Answers differ but differences don't affect factuality
Respond with JSON: {"category": "LETTER", "reason": "explanation"}
and the eval works as expected. But, the only way I can find to view the JSON response of the grader is by turning --verbose
on. The 'category' selection, for instance, isn't available in the dashboard or json outputs.
I can pipe the command output to a file and jq/grep through that, but I feel like I'm probably missing a better way to grab that info?wmluke
08/27/2025, 2:06 AMtool
and args
test vars translated to json given the static 'MCP Tool Call Test' prompt?
https://github.com/promptfoo/promptfoo/blob/main/examples/simple-mcp/promptfooconfig.yamlFiboape
08/27/2025, 5:44 PMArunS1997
09/01/2025, 8:47 AMElias_M2M
09/01/2025, 2:21 PMdulax
09/04/2025, 3:05 PM{{prompt}}
and then add my own prompts.
I did try this:
prompts:
- '{{prompt}}'
- 'My custom prompt'
But then when I view the output, it's not showing my prompt as another entry in the list - it's appending it to each prompt that was generated.
What am I not getting about this?Tak
09/05/2025, 2:44 PMQuinten
09/05/2025, 6:12 PMprompts:
- "{{problem_description}}"
providers: # A single enabled custom Python provider
- id: file://promptfoo/promptfoo_classifier_provider.py
label: gpt5-only
config:
enabled_providers: ["gpt-5"]
decision_mode: "first"
taxonomy_name: "default"
include_debug_details: true
cache_enabled: true
tests: promptfoo/OSB_sample_data_mini.csv
# tests: promptfoo/OSB_sample_data.csv
defaultTest:
assert:
- type: javascript
value: |
// actual JS test here
this worked in 0.118.0, seems to fail in 0.118.3. downgrading to 0.118.0 seems to get things working again, so maybe just a bug? i didn't see a related issue in GitHub yet and it's also possible I just have weird syntax that I should fixJosema Blanco
09/09/2025, 8:43 AMRysiek
09/09/2025, 11:11 AMproviders:
- id: openai:gpt-4.1-mini
label: openai
config:
apiHost: "yourAIhost.com"
apiKey: sk-abc123
So the prompts themselves are executed properly against "yourAIhost.com" and work properly.
But the llm-rubric tests are then executed against api.openai.com
I tried many different setups. Just the one I provided. I tried setting
defaultTest:
options:
provider: openai:gpt-4.1-mini
I tried using a custom label.
I tried making the whole provider custom http provider, which also worked, and then to reference it in defaultTest, or in the llm-rubric test. Nothing works – the test always hits the api.openai.com
When I tried to use custom https provider, with a label, and then reference it in defaultTest config, I get a response in the UI:
Error: Invariant failed: Expected HTTP provider https:custom to have a config containing {body}, but instead got {}
That looks like a bug, cause it identifies the provider, but not its config. Which works correctly for the prompts themselves, just not in llm-rubric verifier.
Anyone had similar problem and managed to overcome it?