Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive backend REST APIs to support upcoming frontend development, implementing an attack-centric design where all user interactions (including manual conversations) are modeled as "attacks". The implementation includes service layers, API routes, Pydantic models, error handling middleware, registries for managing target/converter instances, a CLI tool, and comprehensive test coverage.
Changes:
- Adds three main services (AttackService, ConverterService, TargetService) for managing attacks, converters, and targets
- Implements REST API routes for attacks, targets, converters, labels, health, and version endpoints
- Creates Pydantic models for request/response validation with RFC 7807 error handling
- Adds instance registries for targets and converters with singleton pattern
- Implements
pyrit_backendCLI command for starting the server with initializer support - Includes 900+ lines of comprehensive unit tests for all services and routes
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| pyrit/backend/services/attack_service.py | Service layer for managing attack lifecycle, messages, and scoring (586 lines) |
| pyrit/backend/services/converter_service.py | Service for managing converter instances and previewing conversions (303 lines) |
| pyrit/backend/services/target_service.py | Service for managing target instance creation and retrieval (187 lines) |
| pyrit/backend/routes/attacks.py | REST API endpoints for attack CRUD operations and messaging (249 lines) |
| pyrit/backend/routes/targets.py | REST API endpoints for target instance management (101 lines) |
| pyrit/backend/routes/converters.py | REST API endpoints for converter instances and preview (134 lines) |
| pyrit/backend/routes/labels.py | REST API endpoint for retrieving filter label options (88 lines) |
| pyrit/backend/models/attacks.py | Pydantic models for attack requests/responses (201 lines) |
| pyrit/backend/models/targets.py | Pydantic models for target instances (52 lines) |
| pyrit/backend/models/converters.py | Pydantic models for converter instances and preview (98 lines) |
| pyrit/backend/models/common.py | Common models including RFC 7807 error responses and sensitive field filtering (93 lines) |
| pyrit/backend/middleware/error_handlers.py | RFC 7807 compliant error handler middleware (182 lines) |
| pyrit/registry/instance_registries/target_registry.py | Registry for managing target instances (88 lines) |
| pyrit/registry/instance_registries/converter_registry.py | Registry for managing converter instances (108 lines) |
| pyrit/cli/pyrit_backend.py | CLI command for starting the backend server with initialization support (217 lines) |
| tests/unit/backend/*.py | Comprehensive unit tests for all services, routes, models, and error handlers (2000+ lines) |
| pyrit/backend/main.py | Updated to register new routes and error handlers |
| pyproject.toml | Adds pyrit_backend CLI entry point and documentation exemption |
| frontend/dev.py | Updated to use pyrit_backend CLI instead of direct uvicorn |
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed under the MIT license. | ||
|
|
||
| """ |
There was a problem hiding this comment.
Concern: Model Duplication/Drift and Confusion
A lot of these are really similar to our core models (MessagePiece, Message, Score, AttackSummary). Additionally, converters and targets here are really similar to the corresponding Identifiers.
Right now, the translation logic (in attack_service.py) is fragile. A name change breaks things, and things like type could all be different (e.g. score_value is float in the backend but a string in the model)
I also think this would be easier to use if all the same fields are available. It could be confusing to program here and not have access to converted_data_type. As a new user I'd be asking myself "what is converted_type and how does it map?". I think it would be really useful to be able to access all the model pieces in the same ways.
Proposal
Could we define the Pydantic models with the same field names and drift detection? E.g. use from_attributes=True
# pyrit/backend/models/message_piece.py
class MessagePieceSchema(BaseModel):
model_config = ConfigDict(from_attributes=True)
# Same names as domain MessagePiece
id: Optional[str] = None # serialized from UUID
role: Literal["system", "user", "assistant", "simulated_assistant"]
original_value: str
converted_value: Optional[str] = None
original_value_data_type: str = "text"
converted_value_data_type: Optional[str] = None
response_error: str = "none"
sequence: int = -1
conversation_id: Optional[str] = None
# ... all 25 fields with same names ...
# API-only extras
friendly_name: Optional[str] = None
_EXTRA_FIELDS = {"friendly_name"}
# Potentially add this as an abstract method in a base class to ensure we add it in all backend models
@classmethod
def from_model(cls, obj, **extras):
instance = cls.model_validate(obj)
for k, v in extras.items():
setattr(instance, k, v)
return instance
Apply the same paradigm to identifiers:
# pyrit/backend/models/converters.py
class ConverterIdentifierSchema(BaseModel):
model_config = ConfigDict(from_attributes=True)
# Same fields as domain ConverterIdentifier
__type__: str
__module__: str
id: Optional[str] = None
supported_input_types: Optional[tuple] = None
supported_output_types: Optional[tuple] = None
sub_converter_identifiers: Optional[List["ConverterIdentifierSchema"]] = None
# ... etc
# API extras
display_name: Optional[str] = None
Add some kind of drift detection test. We could at runtime or even tests I'd be happy with
# tests/unit/backend/test_model_mirrors.py
class TestMessagePieceMirror:
def test_all_domain_fields_present(self):
domain_fields = get_init_param_names(MessagePiece)
schema_fields = set(MessagePieceSchema.model_fields.keys())
extra_fields = MessagePieceSchema._EXTRA_FIELDS
missing = domain_fields - (schema_fields - extra_fields)
assert not missing, f"Schema missing domain fields: {missing}"
Then instead of attack_service, translate with from_model
# Before (in attack_service.py)
pieces = [MessagePiece(piece_id=str(p.id), data_type=p.converted_value_data_type, ...)]
# After
pieces = [MessagePieceSchema.from_model(p) for p in msg.message_pieces]
There was a problem hiding this comment.
I want to add some thoughts here since this is going to set precedent for the rest of the backend module.
I think keeping separate API models is the right call. MessagePiece is huge (we have around 25 fields, converter_identifiers, scorer_identifier, attack_identifier, original_value_sha256, prompt_metadata, originator, etc) and most of that is internal plumbing that an API consumer shouldn't ever see. I think exposing these 8 fields in the API isn't really duplication. It's the whole point of DTOs to keep the public contract clean and let the domain evolve without dragging the API around with it (here is a great article on this).
The from_attributes you suggested to mirror everything pushes us in the opposite direction. It basically couples the API schema to domain internals. Any refactor of MessagePiece becomes an API change. Also, we end up with a bunch of Optional fields that are never actually populated, which is misleading for the users.
Even with the current setup we have today (backend and frontend live together and run locally), this is open source. People will deploy this differently (host backend separately, build their own frontend, etc.). Keeping the API contract independent now would save us a lot of painful untangling later.
I know you mentioned the current mapping is fragile, but I'm not too worried because our mapping code is explicit attribute access. If a domain field gets renamed, lint/type checking should catch it (or even mypy). That said, I do agree the translation code could be organized better, e.g pulling it into dedicated mapper functions (e.g. pyrit/backend/mappers/) would centralize the changes and make it easy to add a few mapping tests.
I think the approach in this PR is correct, let's just clean up the translation layer a bit and I think we'd be in great shape.
There was a problem hiding this comment.
Thanks for chiming in Bashir! This is exactly why I wanted you looking at it also because it's the conversation I want to have.
I don't think most (any?) of the base model properties are internal implementation. If I did think that, I'd agree with what you're saying here. And IMO if we decide fields are internal implementation, perhaps they should move to a different layer (e.g. the database model, etc)
Examples:
Think of something like MessagePiece.attack_identifier. I can imagine adding functionality where we can create an attack based on that, and it would be really handy to use in the display/queries/etc in the UI.
MessagePiece.originator I can also see scenarios. When we have a memory view, users likely want to see conversations that happened with the scorer, converters, adversarial attack; all of which can be distinguished using originator. And if, for example, we decide originator shouldn't be part of the base MessagePiece, I think that's a legitimate choice, but we may want another object or another way, and I like all different places where we want this to be defined consistent.
MessagePiece.original_value_sha256. This is one of the most borderline cases I can think of. If you're a UI user, you may want to look in memory for identical pieces sent. Technically you don't need the sha256. But if we decide that, it may make sense to do that universally, and have the hash only as part of the database.MessagePiece, and only allow users to query using MessagePiece.original_value. So if we decide that separation is nice, it might be good to do it universally.
The architecture I like is something like:
MessagePiece(base)MessagePiece(subclass for database)MessagePiece(subclass for backend REST)MessagePiece(subclass for anyalytics)- ....
There was a problem hiding this comment.
Ok maybe if I walk through them it makes more sense because I think they actually reinforce the DTO approach.
I think the core disagreement is whether MessagePiece fields are internal implementation (or with any pyrit.models). My take is that it depends on who's consuming them. A Python developer writing attack scripts needs all 22+ fields, converter_identifiers with its nested sub_identifier, target_info, converter_specific_params, the scorer_identifier with system_prompt_template and recursive sub-identifiers, original_value_sha256, prompt_metadata, originator, etc. A GUI user clicking through a chat interface needs message content, its type, scores, and error status. That's about 8 fields. The library and the UI just don't have the same needs here, and that's ok, it's exactly why DTO separation exists.
With that in mind, going through your specific examples:
attack_identifier - the backend already exposes this information, just in a cleaner way. AttackSummary has attack_id, target_id, target_type, name as typed top-level fields instead of a raw Dict[str, str]. If we want users to create attacks from a previous identifier, that's a new endpoint with a clean request model, not a reason to mirror the internal dict onto every message piece.
originator - totally agree it'll be useful for a memory view. When that feature lands, we add it to the relevant DTO. I know the counterargument is that we're just keep adding fields one by one, but that's actually the point. Each addition is a deliberate API design decision where we control the shape, naming, and type. Not every field makes it through, something like converter_identifiers (a recursive tree of __type__, __module__, sub_converter_identifiers with supported input/output types) or original_prompt_id (internal lineage tracking UUID) are really library-level orchestration details that a chat UI would never surface. Bulk-mirroring 22 fields upfront means the API carries all of that regardless.
original_value_sha256 - "find identical messages" is a query for a GUI user, not a field you render. The backend can expose a search/dedup endpoint that uses the hash internally. Exposing a raw SHA256 in a REST response is not really useful and the frontend would never display it.
On the subclass idea (base > DB subclass > REST subclass > analytics subclass), this assumes consumers start with the same core and add extras on top. But the API relationship goes the other direction, it wants fewer fields, not more. A REST subclass of MessagePiece would need to hide ~15 fields, which is fighting against what inheritance naturally does.
I also want to flag some concrete cases where the API should diverge from the domain today. score_value is a str internally (stores "true"/"false" or float strings) but the DTO presents it as float, the UI gets a clean typed value. TargetIdentifier contains endpoint and model configs that go through filter_sensitive_fields(), the API strips keys/tokens/credentials. MessagePiece.role allows simulated_assistant in the domain (internal orchestration concept) but the DTO maps it to just "assistant". These are deliberate transformations that from_attributes would bypass.
On the drift detection tests, I think the mapper unit tests already cover this (@romanlutz , it needs some changes which I missed in my original review) since they break on any field rename. But if we want an extra safety net, a lightweight check that the specific fields the mappers touch still exist on the domain model would be totally fine. I just wouldn't go as far as enforcing full field parity between the schema and the domain.
I do think you're raising a fair point about consistency though, if we keep deciding certain fields don't belong in the API, maybe some of them don't belong on the base MessagePiece either. Worth a separate discussion about the domain model itself.
For this PR though, I think the separate DTOs with the mapper module is the right approach. The mapper functions are explicit, testable, and if a domain field renames, lint and type checkers catch it at the attribute access.
There was a problem hiding this comment.
Having worked with this the entire week I tend to agree with @bashirpartovi. Any changes we make to the PyRIT core would potentially break the GUI, or even if we noticed it we'd have to suddenly fix up GUI code for making seemingly unrelated changes. @bashirpartovi explains it in more words than I could have but I agree with everything. The points where I want to add are below:
MessagePiece.role allows simulated_assistant in the domain (internal orchestration concept) but the DTO maps it to just "assistant". These are deliberate transformations that from_attributes would bypass.
We actually do want simulated_assistant so that we can show in the GUI that it's not a message that was a real response from the assistant but rather just prepended by the user. This distinction could be quite important in some cases. Imagine having to revisit conversations from your op and you can't tell what you wrote vs what the model actually produced.
For drift detection tests see the end of test_mappers.py and comment to let me know if that's what you meant.
bashirpartovi
left a comment
There was a problem hiding this comment.
Great job Roman, this is my first pass at your PR, I do have a few more comments but need to think about them more
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed under the MIT license. | ||
|
|
||
| """ |
There was a problem hiding this comment.
I want to add some thoughts here since this is going to set precedent for the rest of the backend module.
I think keeping separate API models is the right call. MessagePiece is huge (we have around 25 fields, converter_identifiers, scorer_identifier, attack_identifier, original_value_sha256, prompt_metadata, originator, etc) and most of that is internal plumbing that an API consumer shouldn't ever see. I think exposing these 8 fields in the API isn't really duplication. It's the whole point of DTOs to keep the public contract clean and let the domain evolve without dragging the API around with it (here is a great article on this).
The from_attributes you suggested to mirror everything pushes us in the opposite direction. It basically couples the API schema to domain internals. Any refactor of MessagePiece becomes an API change. Also, we end up with a bunch of Optional fields that are never actually populated, which is misleading for the users.
Even with the current setup we have today (backend and frontend live together and run locally), this is open source. People will deploy this differently (host backend separately, build their own frontend, etc.). Keeping the API contract independent now would save us a lot of painful untangling later.
I know you mentioned the current mapping is fragile, but I'm not too worried because our mapping code is explicit attribute access. If a domain field gets renamed, lint/type checking should catch it (or even mypy). That said, I do agree the translation code could be organized better, e.g pulling it into dedicated mapper functions (e.g. pyrit/backend/mappers/) would centralize the changes and make it easy to add a few mapping tests.
I think the approach in this PR is correct, let's just clean up the translation layer a bit and I think we'd be in great shape.
…/PyRIT into romanlutz/backend_apis
bashirpartovi
left a comment
There was a problem hiding this comment.
Looks great Roman, just a couple of questions and a call-out :)
|
|
||
| # Get existing messages to determine sequence | ||
| existing = self._memory.get_message_pieces(conversation_id=attack_id) | ||
| sequence = max((p.sequence for p in existing), default=-1) + 1 |
There was a problem hiding this comment.
This is not an issue for now since we have a single user manually sends messages against attacks through the UI, but could be a problem in the future if the usecase ever changes (e.g. API is called pragmatically). You could have a race condition here (classic TOCTOU), e.g.:
Thread A reads max sequence = 5
Thread B reads max sequence = 5
Thread A writes message with sequence = 6
Thread B writes message with sequence = 6 <--- Duplicate
There was a problem hiding this comment.
This is a good point! The way I've been thinking about this is as follows:
- When an operator starts using the GUI, they will be prompted to enter an
operator_name(e.g., "romanlutz") that we store as label and apply to all pieces (and eventually attack results when we migrate labels there...). - When another operator tries to add to a conversation that already has an
operator_name, we can catch that and suggest branching the conversation. This could be done purely in the frontend but that wouldn't prevent the problem if someone just builds their own frontend on top of the backend, of course.
The problem I see with this is that collaborative red teaming could be tricky, e.g., if we want multiple people to pair red team and they want to alternate on who's writing the next prompt, for example. That feels like a very niche case, though..., and they could still do that by having one person sharing their screen. That's how we currently do that, too.
There was a problem hiding this comment.
yeah as I mentioned, this is not an issue when things run locally, and you only have a single user but just something to have in mind in case that changes.
…acking Labels ------ - Source labels from PromptMemoryEntry.labels instead of AttackResult.metadata - Add _collect_labels_from_pieces helper to derive labels from pieces - Add get_unique_attack_labels() to MemoryInterface using JOIN+DISTINCT - Simplify /labels route to delegate to get_unique_attack_labels() - Remove labels from attack metadata in create_attack - Forward labels to prepended pieces and add_message pieces - Inherit labels from existing pieces when adding new messages Pagination ---------- - Refactor list_attacks_async into two phases: Phase 1: query + filter + sort on lightweight AttackResult objects Phase 2: fetch pieces only for the final paginated page - Rename _paginate to _paginate_attack_results (operates on AttackResult) Lineage tracking ---------------- - Add original_prompt_id to MessagePieceRequest DTO - Forward original_prompt_id through mapper to MessagePiece domain object Validation & style ------------------ - Add max_length=50 on PrependedMessageRequest.pieces and AddMessageRequest.pieces - Add max_length=200 on CreateAttackRequest.prepended_conversation - Add TOCTOU comment on sequence read-then-write in add_message_async - Add import placement rule to style guide - Move contextlib.closing import to top of memory_interface.py - Remove unused TYPE_CHECKING import from labels route Tests ----- - Add get_unique_attack_labels tests (empty, single, merge, no pieces, no labels, non-attack pieces, non-string values, sorted keys) - Add pagination test verifying pieces fetched only for page - Add original_prompt_id forwarding and default tests in mappers - Add labels stamping tests for prepended pieces and add_message - Add prepend ordering + lineage preservation test - Add _collect_labels_from_pieces tests - Update existing tests to match new label source (pieces, not metadata)
Replace opaque Dict[str, Any] fields in API DTOs with explicit typed fields to prevent internal PyRIT core structures from leaking through the REST API boundary. Models: - AttackSummary: replace attack_identifier dict with attack_type, attack_specific_params, target_unique_name, target_type, converters - TargetInstance: replace identifier dict with target_type, endpoint, model_name, temperature, top_p, max_requests_per_minute, target_specific_params - ConverterInstance: replace type/params with converter_type, supported_input_types, supported_output_types, converter_specific_params, sub_converter_ids - CreateConverterResponse: use converter_type/display_name Mappers: - Extract specific fields from identifier objects via attribute access instead of calling .to_dict() + filter_sensitive_fields() - Add dedicated mapper modules for targets and converters API enhancements: - Add attack list filtering by attack_class, converter_classes, outcome, labels, min/max turns with SQL-level filtering - Add /attacks/attack-options and /attacks/converter-options endpoints - Support three-state converter_classes: None=no filter, []=no converters, non-empty=must have all listed - Stamp source:gui label via labels.setdefault() in create_attack_async - Add get_unique_attack_labels endpoint for label discovery Memory layer: - Add get_unique_attack_class_names, get_unique_converter_class_names, get_unique_attack_labels to MemoryInterface, SQLiteMemory, and AzureSQLMemory - SQL-level filtering for attack results by labels and harm categories Tests: - Expand mapper tests with coverage for no-target, converters extraction, attack_specific_params passthrough, None input/output types - Expand attack service tests for filtering, options, pagination - Update all test mocks to use attribute-based identifiers - Fix mypy errors (Sequence[Any] for pieces, ChatMessageRole for role)
- Widen role fields in DTOs from Literal["user", "assistant", "system"] to ChatMessageRole so simulated_assistant (and other roles like tool, developer) flow through to the frontend without remapping - Switch mapper from deprecated .role property (which collapses simulated_assistant → assistant) to get_role_for_storage() which preserves the actual stored role - Add TestDomainModelFieldsExist: 16 parametrized tests that verify every field the mappers access still exists on AttackIdentifier, TargetIdentifier, and ConverterIdentifier dataclasses - Update mock pieces in test_mappers.py and test_attack_service.py to configure get_role_for_storage() Files changed: pyrit/backend/models/attacks.py pyrit/backend/mappers/attack_mappers.py tests/unit/backend/test_mappers.py tests/unit/backend/test_attack_service.py tests/unit/backend/test_converter_service.py
| def map_outcome(outcome: AttackOutcome) -> Optional[Literal["undetermined", "success", "failure"]]: | ||
| """ | ||
| Map AttackOutcome enum to API outcome string. | ||
|
|
||
| Returns: | ||
| Outcome string ('success', 'failure', 'undetermined') or None. | ||
| """ | ||
| if outcome == AttackOutcome.SUCCESS: | ||
| return "success" | ||
| elif outcome == AttackOutcome.FAILURE: | ||
| return "failure" | ||
| else: | ||
| return "undetermined" |
There was a problem hiding this comment.
I think you can completely eliminate this by doing two things.
- Change
AttackOutcomeclass to also inherit fromstr, i.e.class AttackOutcome(str, Enum): - Change
AttackSummary.outcometype to beAttackOutcome
That's it. You just eliminated this mapping function completely. The AttackOutcome values directly translate to what we have in the UI.
Or if you want to make it more modernized, you could change the AttackOutcome class a little bit, with no impact to the rest of the code, as follows:
class AttackOutcome(StrEnum):
"""
Enum representing the possible outcomes of an attack.
"""
# The attack was successful in achieving its objective
SUCCESS = auto()
# The attack failed to achieve its objective
FAILURE = auto()
# The outcome of the attack is unknown or could not be determined
UNDETERMINED = auto()Pydantic would be able to serialize this as well to a string value, the default is the lower-case of the enums, e.g. "success", "failure", and "undetermined".
You could also map a string back to AttackOutcome, e.g. AttackOutcome("undetermined")
| def attack_result_to_summary( | ||
| ar: AttackResult, | ||
| *, | ||
| pieces: Sequence[Any], |
There was a problem hiding this comment.
Shouldn't pieces have MessagePiece type instead of Any?
| ) | ||
|
|
||
|
|
||
| def pyrit_scores_to_dto(scores: List[Any]) -> List[Score]: |
There was a problem hiding this comment.
This should not be Any, please use the actual type. You can always alias it at the import using as
| ] | ||
|
|
||
|
|
||
| def _infer_mime_type(*, value: Optional[str], data_type: str) -> Optional[str]: |
There was a problem hiding this comment.
shouldn't data_type be a PromptDataType ?
PyRIT/pyrit/models/literals.py
Line 6 in c0ce883
| return mime_type | ||
|
|
||
|
|
||
| def pyrit_messages_to_dto(pyrit_messages: List[Any]) -> List[Message]: |
There was a problem hiding this comment.
Please add proper type hints. You can resolve any conflicts with PyRIT messages by using an import alias or a more specific name for the DTO.
| return text[:100] + "..." if len(text) > 100 else text | ||
|
|
||
|
|
||
| def _collect_labels_from_pieces(pieces: Sequence[Any]) -> Dict[str, str]: |
There was a problem hiding this comment.
Same as before, we need proper typing here
| Label dict, or empty dict if no pieces have labels. | ||
| """ | ||
| for p in pieces: | ||
| labels = getattr(p, "labels", None) |
There was a problem hiding this comment.
please use direct access instead of getattr
| from pyrit.backend.models.targets import TargetInstance | ||
|
|
||
|
|
||
| def target_object_to_instance(target_unique_name: str, target_obj: Any) -> TargetInstance: |
There was a problem hiding this comment.
same thing here, we need proper typing for target_obj.
| Returns: | ||
| TargetInstance DTO with metadata derived from the object. | ||
| """ | ||
| identifier = target_obj.get_identifier() if hasattr(target_obj, "get_identifier") else None |
There was a problem hiding this comment.
This should be a direct access instead of hasattr
| return TargetInstance( | ||
| target_unique_name=target_unique_name, | ||
| target_type=identifier.class_name if identifier else target_obj.__class__.__name__, | ||
| endpoint=getattr(identifier, "endpoint", None) if identifier else None, |
There was a problem hiding this comment.
can we use direct access here?
Description
Adding backend APIs to support upcoming frontend development. This is based on an initial proposal and review.
Tests and Documentation
Includes tests for all APIs.