FEAT: Add Agent Security Evaluation for Sensitive Data Access#1367
FEAT: Add Agent Security Evaluation for Sensitive Data Access#1367amiteliahu wants to merge 1 commit intoAzure:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
| @@ -0,0 +1,186 @@ | |||
| # Copyright (c) Microsoft Corporation. | |||
| # Licensed under the MIT license. | |||
There was a problem hiding this comment.
Let's contain all these utilities into the notebook (sensitive_data_access_attack.ipynb) since this isn't a new attack but rather a utilization of the attack. I'm thinking this PR should be showcasing your POC in a notebook and helping users adapt that POC
|
|
||
| import pytest | ||
|
|
||
| from pyrit.executor.attack.single_turn.agent_security import ( |
There was a problem hiding this comment.
maybe obvious but we can remove this file when we consolidate the agent_security code into the notebook
| @@ -0,0 +1,215 @@ | |||
| dataset_name: agent_security_sensitive_data_access | |||
There was a problem hiding this comment.
this dataset is great! small nit: I would rename it to seed_datasets/local/agentic/sensitive_data_access.prompt
|
Nice work, excited to try this out! Left a few comments and am happy to hop on a call to discuss :) |
| # --------------------------------------------------------------------------- | ||
| # 5. Print and save | ||
| # --------------------------------------------------------------------------- | ||
| SEPARATOR = "=" * 80 |
There was a problem hiding this comment.
we generally don't given advice / next steps based on the results of an attack
@hannahwestra25
Description
Adds an Agent Security Evaluation feature that tests whether AI agents with tool access can be manipulated into reading sensitive files (including SSH and API keys, command history, environment variables, users information) through adversarial prompt injection.
How it works:
An optional (recommended) Docker sandbox is provided for isolated testing, but the evaluation works with any HTTP-exposed agent, users can plant canaries via plant_canaries.py or Dockerfile.canary-template in their own environment.
What's included:
agent_security.py — canary markers, content, scorer factories
sensitive_data_access.prompt — 30 attack prompts
agent-sandbox — Docker sandbox with LangChain example agent
sensitive_data_access_attack.ipynb — end-to-end notebook
0_agent_security.md — setup docs and API reference
Platform-agnostic — works with any HTTP-exposed agent (LangChain, Semantic Kernel, AutoGen, etc.). Users can plug in their own agent via Dockerfile.canary-template.
plant_canaries.py - non-Docker alternative
Tests and Documentation
Unit tests: test_agent_security.py
Overview doc added to _toc.yml