JohnRoy
05/23/2025, 12:36 PMroy
05/25/2025, 12:33 AMpyq
05/29/2025, 7:34 AMraxrb
05/29/2025, 5:56 PMvibecoder
05/31/2025, 5:54 AMvibecoder
05/31/2025, 5:56 AMgrj373
06/02/2025, 2:21 PMvibecoder
06/05/2025, 2:05 AMDonato Azevedo
06/05/2025, 1:52 PM- type: python
value: ('Outorgar Poderes' in output['pessoas_analisadas'][1]['restricoes_valor'] and '12' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'] and 'meses' in output['pessoas_analisadas'][1]['restricoes_valor']['Outorgar Poderes']['valor_alcada'])
metric: alcada
And this is not even robust, becase it depends on the order of the output['pessoas_analisadas']
list being consistent across different evals.
I'd appreciate any sugestion. Meanwhile, I was even considering contributing a transform
property to assert-sets, which would enable this kind of syntax:
tests:
- description: test for persona 1
vars:
- file://path/to/pdf
assert:
- type: assert-set
transform: next(o for o in output['pessoas_analisadas'] if o['nome'] == 'NAME OF PERSON')
assert:
- type: python
value: ('Outorgar Poderes' in output['restricoes_valor'] and '12' in output['restricoes_valor']['Outorgar Poderes']['valor_alcada'] ...
Opinions?Donato Azevedo
06/05/2025, 5:05 PMpython
def get_assert(output: dict[str, any], context) -> bool | float | GradingResult:
return {
'pass': True,
'score': 0.11,
'reason': 'Looks good to me',
'named_scores': {
'answer_similarity': 0.12,
'answer_correctness': 0.13,
'answer_relevancy': 0.14,
}
}
I was expecting to see the three answer_*
named metrics appearing up top
https://cdn.discordapp.com/attachments/1380231112252723332/1380231112629944341/Screenshot_2025-06-05_at_14.04.57.png?ex=68431fe4&is=6841ce64&hm=1e59574abdb29e00278b719905d74efcea8620d330947203cbe31d0c4fc9301b&Bryson
06/06/2025, 7:04 PM[chat.js:161] completions API response: {"id":"chatcmpl-BfWK1RSvr3LAckI7hoUHM9dYo0Zce","object":"chat.completion","created":1749235393,"model":"gpt-4o-mini-2024-07-18","choices":[{"index":0,"message":{}
<anonymous_script>:430
SyntaxError: Expected ',' or '}' after property value in JSON at position 1966 (line 430 column 1)
at JSON.parse (<anonymous>)
at encodeMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:95:32)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async addMathPrompt (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/mathPrompt.js:122:33)
at async action (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/strategies/index.js:195:34)
at async applyStrategies (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:241:35)
at async synthesize (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/index.js:678:85)
at async doGenerateRedteam (/opt/homebrew/Cellar/promptfoo/0.114.5/libexec/lib/node_modules/promptfoo/dist/src/redteam/commands/generate.js:243:88)
Is this a Promptfoo bug by chance? Or is it possible I'm doing something wrong? Happy to DM over my promptfooconfig.yaml if helpfulsasha
06/09/2025, 7:42 PMvibecoder
06/10/2025, 6:09 AManurag
06/10/2025, 3:30 PM0.75ms
https://cdn.discordapp.com/attachments/1382019030801584148/1382019031208427571/image.png?ex=6849a105&is=68484f85&hm=180be1f47d77da7022deef500bd41f20d9b2cbcdb7f1c54ce8a3803f5bb334eb&
https://cdn.discordapp.com/attachments/1382019030801584148/1382019031539650581/image.png?ex=6849a105&is=68484f85&hm=d91040592a9966b5ff36fab22c0cfa37f1b6662c568578d534d0de07ab0040fb&Donato Azevedo
06/10/2025, 6:40 PMbeforeAll
extension hook. I wanted to print some info to stdout about the progress, without needing to pass --verbose
when running to view it. How can I do it?grj373
06/11/2025, 4:14 PMBronwyn
06/12/2025, 3:49 AMphillprice
06/13/2025, 11:00 AMredteam:
provider: vertex:gemini-2.0-flash
I would expect to see vertex prompts not the https://api.promptfoo.app/api/ in logs or am I missing something
https://cdn.discordapp.com/attachments/1383038229430800417/1383038229657423903/message.txt?ex=684d5639&is=684c04b9&hm=e886247025063d28d786293cbbb0f5896f90d0fef84507c57f7a63f9794547fa&sreven12
06/16/2025, 3:13 AMstraygar
06/18/2025, 12:07 PMpromptfooconfig.yaml
file with different scenarios etc. to catch regressions in CI and evaluate potential prompts & models. I use a custom Python provider & some custom python assertions, along with built in ones.
My current process of authoring new tests is:
- add new test
- comment out most of the file
- if something goes wrong, add print statements, rinse and repeat
Obviously this is not the best. I was never able to get --filter-pattern
or --filter-metadata
to work, and the entire promptfooconfig
file is always run. I just found a way to potentially attach a python debugger, but it's a bit rough: https://github.com/promptfoo/promptfoo/commit/41cc82b2489efce4b167ebb25cc8cc6bcaf667b9Nate
06/18/2025, 3:09 PM_wutato_
06/19/2025, 7:24 AMplugins:
- id: debug-access # Tests for exposed debugging interfaces and commands
strategies:
- id: basic # Original plugin tests without any additional strategies or optimizations
- id: jailbreak:composite # Combines multiple jailbreak techniques for enhanced effectiveness
- id: jailbreak:likert # Uses Likert scale-based prompts to bypass content filters
- id: jailbreak # Single-shot optimization of safety bypass techniques
~~Saw a "Fail" case that made use of Base64 encoded image as shown in the uploaded picture.
Would like to check if it is possible to get the final prompt sent to the LLM or how this prompt is created.
Also unsure if the prompt was generated under DebugAccess or by a strategy.
Any help or advice on where to check is greatly appreciated. Thank you!~~
*Edit: *
Realised the UI interprets the prompt as a base64 image because of the following line in the ResultsTable.tsx
if (
typeof value === 'string' &&
(value.match(/^data:(image\/[a-z]+|application\/octet-stream);base64,/) ||
value.match(/^\/[0-9A-Za-z+/]{4}.*/))
)
Still unsure about the actual prompt being sent out, as manually using the prompt listed in the image gets a PASS response.
https://cdn.discordapp.com/attachments/1385158186046324829/1385158186494984294/base64_fail.PNG?ex=68565e16&is=68550c96&hm=970229f01b0872c18132009b9e4e8014b90515de8f2aaa8fb687b76892458fb4&Dan
06/20/2025, 8:47 AMAlba
06/20/2025, 10:43 AMmjt2007
06/20/2025, 6:21 PMGuillermoB
06/20/2025, 10:46 PMmarco.sbodio
06/22/2025, 12:35 PMahmedelbaqary.
06/24/2025, 1:55 PMGreg
06/26/2025, 11:02 PMno1care
06/27/2025, 6:56 AM