FIX Support errors in `MultiPromptSendingAttack`, add safe completion support to `SelfAskRefusalScorer` by fdubut · Pull Request #1366 · Azure/PyRIT

fdubut · 2026-02-12T00:52:52Z

Description

A couple of fixes:

Support content moderation errors in MultiPromptSendingAttack. Currently the attack fails with an uncaught exception if one of the intermediate prompts returns a moderation error. With the fix, it will fail gracefully.
Support safe completions in SelfAskRefusalScorer. Currently, the scorer will lean towards "not a refusal" if the model returns a safe completion (which most modern models post GPT-5 will do). With the added option, safe completions are considered a refusal. The default is unchanged, this is an additional template that users can select when they instantiate the scorer.

Added one test to verify SelfAskRefusalScorer throws an exception when no objective is provided and safe completions are disallowed (the scorer needs to know the objective to assess whether this was a "safe" completion or a "true" completion).

fdubut added 3 commits February 11, 2026 10:48

Add support for safe completions in refusal scorer

d5fc459

Fix blocked/error handling of MultiPromptSendingAttack

86e5985

Merge branch 'main' of https://github.com/fdubut/PyRIT into bug_fixes

5bfe81b