Skip to content

Conversation

@mkmeral
Copy link
Contributor

@mkmeral mkmeral commented Feb 9, 2026

Description

When SummarizingConversationManager is used without a separate summarization_agent and a ContextWindowOverflowException occurs, the agent crashes with:

ConcurrencyException: Agent is already processing a request. Concurrent invocations are not supported.

Root cause: _generate_summary() calls agent("Please summarize this conversation."), which re-enters stream_async() on the same agent instance. The _invocation_lock (threading.Lock, non-reentrant) is already held by the outer stream_async() call, so the inner call fails immediately.

The fix: Split _generate_summary() into two code paths:

  1. When a separate summarization_agent IS provided — keep existing behavior, call summarization_agent(...). This is safe because it's a different agent instance with its own lock.

  2. When NO separate agent is provided (the bug case) — call agent.model.stream() directly instead of agent(...). Summarization only needs the model to produce text; it doesn't need tools, callback handlers, tracing, or any other agent machinery. The response stream is processed via process_stream() (the same function the event loop uses internally).

This approach:

  • Zero changes to agent.py — the core stays simple
  • No concurrency mechanism changes (no RLock, no lock release/reacquire hacks)
  • No agent state corruption (metrics, traces, interrupt state are untouched)
  • Fix is scoped exactly where the problem is

Related Issues

Fixes the re-entrant lock deadlock in SummarizingConversationManager when no dedicated summarization_agent is configured.

Documentation PR

N/A — no public API changes; behavior is now correct where it previously crashed.

Type of Change

Bug fix

Testing

How have you tested the change?

37 tests total (28 existing updated + 9 new), all passing:

  • Unit tests cover both code paths (model-direct vs. dedicated agent), system prompt handling, message construction, error propagation, tool registry behavior, and agent state isolation.
  • Integration test (test_full_agent_pipeline_no_reentrant_lock_on_context_overflow) reproduces the exact bug scenario end-to-end: a real Agent instance calls agent("prompt") → model raises ContextWindowOverflowExceptionreduce_context() fires while the lock is held → summary is generated via model.stream() → event loop retries → agent returns successfully. This test would have caught the original bug.
$ python -m pytest tests/strands/agent/test_summarizing_conversation_manager.py -v
37 passed in 0.31s

$ python -m pytest tests/strands/agent/test_agent.py -v -k "concurren"
2 passed in 0.18s

$ python -m pytest tests/strands/agent/test_agent.py -v
100 passed in 3.28s

$ python -m pytest tests/strands/agent/test_conversation_manager.py -v
26 passed in 0.89s
  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mkmeral
Copy link
Contributor Author

mkmeral commented Feb 9, 2026

/strands review

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...sation_manager/summarizing_conversation_manager.py 90.90% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Code Review Summary

Assessment: ✅ Approve

Key Themes:

  • Clean separation of summarization paths that correctly addresses the re-entrant lock deadlock
  • Excellent test coverage with 37 unit tests covering both code paths, edge cases, and the core bug fix scenario
  • Minimal, targeted changes that don't touch agent.py core logic
  • Good use of existing infrastructure (process_stream from event_loop)

Strengths:

  1. Well-scoped fix: The solution addresses exactly the problem without over-engineering
  2. Comprehensive tests: The test_reduce_context_succeeds_with_held_lock test directly validates the bug fix by acquiring the lock before calling reduce_context
  3. Clear documentation: Docstrings explain the rationale for each code path
  4. Integration testing: The test_context_overflow_triggers_summarization_and_recovery validates the end-to-end scenario

Minor Notes:

  • Codecov reports 94.73% patch coverage with 1 partial line - this is acceptable for branch coverage

This is a solid bug fix that follows the project's patterns and maintains code quality. Nice work! 🎉

When no dedicated summarization_agent is provided (the default),
_generate_summary() was calling agent('Please summarize...') which
re-enters the full agent pipeline.  Because the outer __call__ already
holds _invocation_lock (a threading.Lock), the inner call deadlocks.
Even with a separate event loop thread the lock is not re-entrant.

The fix splits _generate_summary() into two paths:

1. _generate_summary_with_agent() – used when a dedicated
   summarization_agent was supplied at init time.  Preserves the
   existing behaviour (full agent pipeline, tool execution, etc.).

2. _generate_summary_with_model() – the new default path.  Calls
   agent.model.stream() directly with the summarization system prompt
   and processes the response via process_stream().  This bypasses the
   agent pipeline entirely, avoiding the lock, metrics reset, trace
   span creation and tool-execution overhead.

Tests are updated accordingly:
- MockAgent now wires model.stream() to return proper async stream
  events so the default path works in tests.
- New tests verify model.stream() is called directly, with correct
  system prompt, with tool_specs=None, and that agent state is never
  mutated in the default path.
- Noop-tool and state-restoration tests are moved to the agent path
  (summarization_agent provided) where they remain relevant.
@github-actions github-actions bot added size/m and removed size/m labels Feb 9, 2026
@mkmeral mkmeral force-pushed the fix/summarizing-use-model-stream branch from b619f0c to 43563cc Compare February 9, 2026 21:07
@github-actions github-actions bot added size/xs and removed size/m labels Feb 9, 2026
@mkmeral mkmeral enabled auto-merge (squash) February 9, 2026 22:17
@afarntrog
Copy link
Contributor

Just noting that we need to remember to add tracing/metrics to this.

Copy link
Contributor

@afarntrog afarntrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to see an integration test for this as well. Testing both cases - agent and direct call.

@mkmeral mkmeral merged commit 18a349c into strands-agents:main Feb 11, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants