fix: validate tool args are dict in ToolEnv.env_response by zamal-db · Pull Request #960 · PrimeIntellect-ai/verifiers

zamal-db · 2026-02-25T10:39:24Z

While training a tool-calling agent with ToolEnv, I hit a crash mid-rollout caused by a model (Qwen3-4B) producing double-encoded JSON tool arguments. json.loads succeeds but returns a str instead of a dict, which then crashes when unpacked as **kwargs in call_tool.

StatefulToolEnv.env_response already guards against this (lines 141-147), but ToolEnv.env_response was missing the same check. This aligns both methods.

Changes:

Add isinstance(parsed_args, dict) validation in ToolEnv.env_response, matching the existing StatefulToolEnv pattern
Non-dict args raise ValueError, caught by the existing error handler (returned to model or stops rollout via stop_errors)
Add two tests: stop-on-error path and graceful-fallback path

Closes #562

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
Any dependent changes have been merged and published

Note

Low Risk
Small, localized input-validation change in tool-call parsing plus tests; behavior only differs for malformed/non-dict tool arguments.

Overview
Prevents ToolEnv.env_response from crashing when a model returns double-encoded JSON tool arguments by validating that json.loads(tool_call.arguments) produces a dict and treating non-dicts as a parse error.

Adds coverage for this case with two new tests: one verifying rollouts stop with ToolParseError when ValueError is in stop_errors, and another ensuring the env returns a tool error message and continues when it is not.

^{Written by Cursor Bugbot for commit 0d5da8c. This will update automatically on new commits. Configure here.}

When a model produces double-encoded JSON, json.loads succeeds but returns a str instead of a dict. This crashes the RL training run when the str is unpacked as **kwargs. StatefulToolEnv already has this validation (lines 141-147), but ToolEnv was missing it. This aligns both env_response methods. Closes PrimeIntellect-ai#562

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate tool args are dict in ToolEnv.env_response#960

fix: validate tool args are dict in ToolEnv.env_response#960
zamal-db wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
zamal-db:fix/tool-env-non-dict-args-fallback

zamal-db commented Feb 25, 2026 •

edited by cursor bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zamal-db commented Feb 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zamal-db commented Feb 25, 2026 •

edited by cursor bot

Loading