Skip to content

fix: validate tool args are dict in ToolEnv.env_response#960

Open
zamal-db wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
zamal-db:fix/tool-env-non-dict-args-fallback
Open

fix: validate tool args are dict in ToolEnv.env_response#960
zamal-db wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
zamal-db:fix/tool-env-non-dict-args-fallback

Conversation

@zamal-db
Copy link

@zamal-db zamal-db commented Feb 25, 2026

While training a tool-calling agent with ToolEnv, I hit a crash mid-rollout caused by a model (Qwen3-4B) producing double-encoded JSON tool arguments. json.loads succeeds but returns a str instead of a dict, which then crashes when unpacked as **kwargs in call_tool.

StatefulToolEnv.env_response already guards against this (lines 141-147), but ToolEnv.env_response was missing the same check. This aligns both methods.

Changes:

  • Add isinstance(parsed_args, dict) validation in ToolEnv.env_response, matching the existing StatefulToolEnv pattern
  • Non-dict args raise ValueError, caught by the existing error handler (returned to model or stops rollout via stop_errors)
  • Add two tests: stop-on-error path and graceful-fallback path

Closes #562

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Low Risk
Small, localized input-validation change in tool-call parsing plus tests; behavior only differs for malformed/non-dict tool arguments.

Overview
Prevents ToolEnv.env_response from crashing when a model returns double-encoded JSON tool arguments by validating that json.loads(tool_call.arguments) produces a dict and treating non-dicts as a parse error.

Adds coverage for this case with two new tests: one verifying rollouts stop with ToolParseError when ValueError is in stop_errors, and another ensuring the env returns a tool error message and continues when it is not.

Written by Cursor Bugbot for commit 0d5da8c. This will update automatically on new commits. Configure here.

When a model produces double-encoded JSON, json.loads succeeds but
returns a str instead of a dict. This crashes the RL training run
when the str is unpacked as **kwargs.

StatefulToolEnv already has this validation (lines 141-147), but
ToolEnv was missing it. This aligns both env_response methods.

Closes PrimeIntellect-ai#562
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal for default fallback for invalid tool call args in env_response

1 participant