FEAT: Jailbreak Scenario Expansion#1340

Open

ValbuenaVC wants to merge 54 commits intoAzure:mainfrom

ValbuenaVC:jailbreak2

Contributor

ValbuenaVC commented Jan 30, 2026 •

edited

Loading

Description

Adding more features to the Jailbreak scenario! Major changes:

JailbreakStrategy now supports multiple different attack types via ManyShot, PromptSending, Crescendo, and RedTeaming values.
New attack strategies can be collected using SINGLE_TURN and MULTI_TURN aggregates; PYRIT has been deprecated.
The initializer now accepts k, n, and jailbreaks; these allow you to choose a random number of jailbreaks, how many times to try each jailbreak, and to choose which jailbreaks specifically you'd like to use respectively. Note that k and jailbreaks are mutually exclusive.
A default adversarial target has been added to support the relevant attack strategies.

Tests and Documentation

Expanded to support new strategies.

Victor Valbuena and others added 27 commits

January 26, 2026 20:06


          Scaffolding

022f70a


          Precommit

e85cdb9


          fixtures and basic tests

fc260c3


          basic tests

89a8079


          basic tests

b18f224


          last test

96ddf6c


          jailbreak format test

eb4e936


          sample jailbreak prompt

243ea0a


          Merge branch 'main' into jailbreak

946fdde


          real jailbreaks added

132caf5


          Merge branch 'main' into jailbreak

c4e625f


          Merge branch 'main' into jailbreak

79d1a64


          changing dataset name

cb28fda


          moved jailbreak discovery

f399b6d


          changed path resolution

75436ea


          minor changes

c0022f6


          minor bug

9f579f2


          Merge branch 'main' into jailbreak

ccf7025


          old dataset name

349cc6b


          precommit

9fa6430


          random jailbreak selection

513cbf3


          error handling

b57b35a


          error handling docstring

999a0c6


          Merge branch 'Azure:main' into jailbreak2

f3ec8bb


          scaffolding

89fd8bd


          scaffolding for subset

66650a6


          scaffolding

fa5b01a

ValbuenaVC changed the title ~~Jailbreak Scenario Expansion~~ [DRAFT] Jailbreak Scenario Expansion


          Merge branch 'main' into jailbreak2

44bc05c

ValbuenaVC changed the title ~~[DRAFT] Jailbreak Scenario Expansion~~ [DRAFT] FEAT: Jailbreak Scenario Expansion

ValbuenaVC and others added 8 commits

February 5, 2026 13:27


          Merge branch 'main' into jailbreak2

db5270c


          Merge branch 'main' into jailbreak2

9d9666f


          subset

302101f


          tweaking

9c7b757


          new strategy template

737aabe


          types'

472bd20


          adversarial

b07e197


          Merge branch 'main' into jailbreak2

c31d088

ValbuenaVC marked this pull request as ready for review

February 10, 2026 01:16

ValbuenaVC changed the title ~~[DRAFT] FEAT: Jailbreak Scenario Expansion~~ FEAT: Jailbreak Scenario Expansion

ValbuenaVC and others added 7 commits

February 11, 2026 10:39


          Merge branch 'main' into jailbreak2

6dcf318


          Merge branch 'main' into jailbreak2

ec9d731


          Merge branch 'main' into jailbreak2

163e582


          unit test fixes

a503a4b


          Merge branch 'jailbreak2' of https://github.com/ValbuenaVC/PyRIT into…

af32046

… jailbreak2


          unit test fix

6da95f9


          mypy

73d77a6

fdubut reviewed

View reviewed changes

Contributor

fdubut left a comment

Added a few comments. The main one is the question mark around running multi-turn attacks on single-turn jailbreaks (which all of them are in the main template directory). It would be worth testing what it looks like and if that even makes sense.

pyrit/datasets/jailbreak/text_jailbreak.py Outdated

                   @classmethod
-                  def get_all_jailbreak_templates(cls, n: Optional[int] = None) -> List[str]:
+                  def get_all_jailbreak_templates(cls, k: Optional[int] = None) -> List[str]:

Contributor

fdubut Feb 12, 2026

Not sure if this function is called directly anywhere and if it would be a breaking change, but it would make sense to me to rename it get_jailbreak_templates since by definition you are not always returning them all.

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

+                  # Strategies for tweaking jailbreak efficacy through attack patterns
+                  ManyShot = ("many_shot", {"single_turn"})
+                  PromptSending = ("prompt_sending", {"single_turn"})
+                  Crescendo = ("crescendo", {"multi_turn"})

Contributor

fdubut Feb 12, 2026

Did you try running these jailbreaks with multi-turn attacks? Most of the jailbreaks in PyRIT are single-turn jailbreaks, I'm not even quite sure how they would look like with a multi-turn attack...

Contributor Author

ValbuenaVC Feb 13, 2026

After tweaking it on my end, it's definitely less consistent than I'd like, so I think I'm removing multi-turn attacks from the scenario unless anyone objects. Especially with RedTeamingAttack, the adversarial model doesn't always seem to "get" the role it plays in the jailbreak, and as you said it doesn't seem useful in line with the existing jailbreaks we have in PyRIT.

pyrit/scenario/scenarios/airt/jailbreak.py

+                  Strategy for jailbreak attacks.
                   """
+                  # Aggregate members (special markers that expand to strategies with matching tags)

Contributor

fdubut Feb 12, 2026

What's the default strategy? Some other scenarios have a get_default_strategy function but I don't see one here. I would recommend to make the default PromptSending because it's the one that makes the most sense for the jailbreaks we have in PyRIT so far.

Contributor Author

ValbuenaVC Feb 12, 2026

Right now it's JailbreakStrategy.ALL, but I like that idea more, so I'm going to change it to PromptSending. Should be line 84

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

                       scenario_result_id: Optional[str] = None,
-                      n_jailbreaks: Optional[int] = 3,
+                      k: Optional[int] = None,
+                      n: int = 1,

Contributor

fdubut Feb 12, 2026

k and n are not quite self-explanatory... how about we keep n_jailbreaks and add n_attempts as something that's clearer in the code?

Contributor

fdubut Feb 12, 2026

(btw I do agree with the new default value of None which means all jailbreaks)

Contributor Author

ValbuenaVC Feb 12, 2026

Agreed that it's unclear, will change

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

-                          n_jailbreaks (Optional[int]): Choose n random jailbreaks rather than using all of them.
+                          k (Optional[int]): Choose k random jailbreaks rather than using all of them.
+                          n (Optional[int]): Number of times to try each jailbreak.
+                          jailbreaks (Optional[int]): Dedicated list of jailbreaks to run.

Contributor

fdubut Feb 12, 2026

Should be Optional[List[str]]
Can we clarify in the docstring (or wherever else you think is appropriate) that these are the names of the jailbreaks from our template list in PyRIT? At first glance I thought these are custom jailbreaks, and the strings were the full text of the jailbreaks.

Contributor Author

ValbuenaVC Feb 12, 2026

Good catch! And changed in latest commit

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

+                          jailbreaks (List[str]): List of jailbreak names.
+                      Raises:
+                          ValueError: If jailbreaks not discovered.

Contributor

fdubut Feb 12, 2026

Would recommend to make it a bit more explicit (e.g. "if at least one provided jailbreak does not exist").

Contributor Author

ValbuenaVC Feb 12, 2026

Changed in latest commit

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

                           List[AtomicAttack]: List of atomic attacks to execute, one per jailbreak template.
+                      Raises:
+                          ValueError: If self._jailbreaks is not a subset of all jailbreak templates.

Contributor

fdubut Feb 12, 2026

Technically it's true but if I understand correctly, it would be thrown by the TextJailbreakConverter initializer at this point, and it doesn't seem to be caught anywhere, so do we need to mention it here? Also we've already done that validation as part of _validate_jailbreaks_subset so I'm not sure it's worth re-mentioning it here.

Contributor Author

ValbuenaVC Feb 12, 2026

Changed in latest commit. This raises section shouldn't be here

pyrit/scenario/scenarios/airt/jailbreak.py Outdated

Contributor

fdubut Feb 12, 2026

A bit of a random question, if we are now allowing n_attempts > 1, is it worth also relaxing max_dataset_size=4?

Contributor Author

ValbuenaVC Feb 12, 2026

Good point. I've removed it, although I'm wondering if we should allow this to be accessible to the user?

ValbuenaVC and others added 11 commits

February 12, 2026 11:13


          Merge branch 'main' into jailbreak2

827ec0e


          params

8168db8


          tweaks

5ac7651


          dataset_size

20ef0c3


          k_jailbreak bug

06bb694


          Merge branch 'main' into jailbreak2

03a1e9b


          tests

6a67ac4


          new strategies

4b441d4


          adversarial chat

b14f564


          roleplay path

07b6142


          roleplay

36b6b95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet