Skip to content

feat: Enhanced Tool Annotations Intelligence (Spec 035)#342

Open
Dumbris wants to merge 9 commits intomainfrom
035-enhanced-annotations
Open

feat: Enhanced Tool Annotations Intelligence (Spec 035)#342
Dumbris wants to merge 9 commits intomainfrom
035-enhanced-annotations

Conversation

@Dumbris
Copy link
Contributor

@Dumbris Dumbris commented Mar 13, 2026

Summary

Implements all 5 features from Spec 035 — leveraging MCP tool annotations for security-aware routing, quarantine protection, session-level risk analysis, and smarter tool discovery.

F1: Annotation Change Detection in Quarantine

  • Tool annotations included in SHA-256 hash for quarantine change detection
  • Detects "annotation rug-pulls" (e.g., server flipping destructiveHint from true→false)
  • Backward compatible: nil annotations → empty string (won't invalidate existing approvals)

F2: Lethal Trifecta Session Risk Analysis

  • retrieve_tools response includes session_risk field analyzing all connected servers
  • Detects Simon Willison's "lethal trifecta": sensitive data + untrusted content + destructive capabilities
  • Risk levels: high (trifecta present), medium (2 of 3), low (0-1)

F3: openWorldHint Enhanced Scanning

  • Tool calls from openWorldHint=true tools tagged with content_trust: untrusted in activity metadata
  • Works for both direct tool calls and tools called within code_execution sandbox
  • Follows MCP spec defaults: nil annotations → open world (untrusted)

F4: Annotation-Based Filtering in retrieve_tools

  • New optional parameters: read_only_only, exclude_destructive, exclude_open_world
  • Agents can self-restrict tool discovery for safer operation
  • Filters applied after BM25 search, before response building

F5: Annotation Coverage Reporting

  • New REST endpoint: GET /api/v1/annotations/coverage
  • Reports per-server and total annotation adoption metrics
  • Title-only annotations not counted (behavioral hints required)

Test plan

  • F1: 3 tests — hash includes annotations, nil backward compat, rug-pull detection
  • F2: 6 tests — trifecta, low/medium risk, nil defaults, disconnected servers, empty snapshot
  • F3: 10 tests — IsOpenWorldTool (6 cases), ContentTrustForTool (4 cases), activity service integration (4 cases)
  • F4: 5 tests — read-only filter, exclude destructive, exclude open-world, combined, no filters
  • F5: 3 tests — mixed coverage, empty servers, title-only not counted
  • All existing tests pass (no regressions)

🤖 Generated with Claude Code

claude added 9 commits March 13, 2026 09:51
Extend calculateToolApprovalHash to include serialized tool annotations
in the SHA-256 hash. This detects "annotation rug-pulls" where a
malicious server flips behavioral hints (e.g., destructiveHint from
true to false) without changing the tool description or schema.

Nil annotations contribute an empty string to maintain backward
compatibility with tools approved before annotation tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add annotation coverage reporting endpoint that shows how many upstream
tools have MCP annotations (hint booleans) vs those that don't, broken
down by server. A tool counts as annotated only if at least one of
ReadOnlyHint, DestructiveHint, IdempotentHint, or OpenWorldHint is set
(Title alone does not count).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…filtering to retrieve_tools (Spec 035 F2+F4)

F2: Session risk analysis examines all connected servers' tool annotations
to detect the "lethal trifecta" — open-world access + destructive capabilities
+ write access. Returns risk level (high/medium/low) in every retrieve_tools
response as session_risk, with a warning when the trifecta is present.

F4: Three new optional boolean parameters (read_only_only, exclude_destructive,
exclude_open_world) allow agents to self-restrict tool discovery scope based on
MCP annotation hints. Nil annotations are treated as most permissive per spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…35 F3)

When a tool call is made via call_tool_read/write/destructive, code_execution,
or direct routing mode, check if the called tool has openWorldHint=true (or nil,
which defaults to true per MCP spec). Tag the activity record metadata with
"content_trust": "untrusted" for open-world tools, or "trusted" for closed-world
tools (openWorldHint=false). This enables downstream security review of tool
outputs that may contain untrusted external data.

Changes:
- Add IsOpenWorldTool() and ContentTrustForTool() helpers in contracts/intent.go
- Add content_trust field to EmitActivityToolCallCompleted event payload
- Add content_trust extraction in handleToolCallCompleted and
  handleInternalToolCall activity service handlers
- Add EmitActivityInternalToolCallWithContentTrust for code_execution path
- Compute content trust in handleCallToolVariant, makeDirectModeHandler, and
  code_execution handler (any open-world tool call marks entire execution)
- Add comprehensive tests: TestIsOpenWorldTool, TestContentTrustForTool,
  TestHandleToolCallCompleted_ContentTrust,
  TestHandleInternalToolCall_ContentTrust

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: a6fd18b
Status: ✅  Deploy successful!
Preview URL: https://7349c162.mcpproxy-docs.pages.dev
Branch Preview URL: https://035-enhanced-annotations.mcpproxy-docs.pages.dev

View logs

@github-actions
Copy link

📦 Build Artifacts

Workflow Run: View Run
Branch: 035-enhanced-annotations

Available Artifacts

  • archive-darwin-amd64 (25 MB)
  • archive-darwin-arm64 (23 MB)
  • archive-linux-amd64 (14 MB)
  • archive-linux-arm64 (13 MB)
  • archive-windows-amd64 (25 MB)
  • archive-windows-arm64 (22 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (28 MB)
  • installer-dmg-darwin-arm64 (25 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 23042267110 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants