Skip to content

Comments

Fix: cancellation wiring#2772

Open
tempusfrangit wants to merge 9 commits intomainfrom
fix/cancellation-wiring
Open

Fix: cancellation wiring#2772
tempusfrangit wants to merge 9 commits intomainfrom
fix/cancellation-wiring

Conversation

@tempusfrangit
Copy link
Member

@tempusfrangit tempusfrangit commented Feb 24, 2026

Summary

Wire cancellation end-to-end from HTTP to worker subprocess, and make CancelationException a proper public API.

Cancel was accepted by the HTTP layer but never reached the worker. This wires the full path: HTTP cancel/connection-drop → supervisor → orchestrator → ControlRequest::Cancel → worker → PyThreadState_SetAsyncExc.

Key changes

Cancellation wiring (existing commits)

  • Orchestrator: add cancel_by_prediction_id (resolves pred ID → slot ID)
  • Supervisor: cancel() delegates to orchestrator via spawned task
  • SyncPredictionGuard: calls supervisor.cancel() on drop (not just token)
  • Sync HTTP handler: spawns prediction in background task so slot lifetime is not tied to the HTTP connection (fixes permit leak on disconnect)
  • worker_bridge: capture py_thread_id at prediction start, use it for cancel_sync_thread(); async cancel still uses future.cancel()
  • Fix sync cancel status: upgrade PyErr to Cancelled when slot is marked cancelled

CancelationException as BaseException + public SDK export

  • CancelationException now derives from BaseException (not Exception) so bare except Exception blocks cannot swallow cancellation — matching KeyboardInterrupt and asyncio.CancelledError
  • Defined as a static type via pyo3_stub_gen::create_exception! (replaces dynamic builtins.type() approach)
  • Exported via coglet._implcogletcog.exceptionscog.CancelationException
  • Auto-generated type stubs include proper class CancelationException(BaseException) definition

Dead code removal

  • Removed SIGUSR1 signal handler, CANCELABLE flag, CancelableGuard, enter_cancelable() — all dead since PyThreadState_SetAsyncExc replaced SIGUSR1
  • Removed _is_cancelable Python getter from Server
  • Updated is_cancelation_exception to use the static type instead of dynamic import from nonexistent cog.server.exceptions

Documentation

  • New Cancellation section in docs/python.md documenting CancelationException usage, import paths, BaseException semantics, and re-raise requirement
  • Updated docs/http.md cancel endpoint docs to use cog.CancelationException import path
  • Regenerated docs/llms.txt

Tests

  • 5 new Python cancel tests (explicit cancel sync/async, connection-drop)
  • 2 txtar integration tests

closes: #2748

Cancel was accepted by the HTTP layer but never reached the worker.
This wires the full path: HTTP cancel/connection-drop → supervisor →
orchestrator → ControlRequest::Cancel → worker → PyThreadState_SetAsyncExc.

Key changes:
- Orchestrator: add cancel_by_prediction_id (resolves pred ID → slot ID)
- Supervisor: cancel() delegates to orchestrator via spawned task
- SyncPredictionGuard: calls supervisor.cancel() on drop (not just token)
- Sync HTTP handler: spawns prediction in background task so slot lifetime
  is not tied to the HTTP connection (fixes permit leak on disconnect)
- cancel.rs: replace SIGUSR1 with PyThreadState_SetAsyncExc for sync
  cancel (works on any thread, not just main); fix fallback exception
  to create a proper class, not an instance
- worker_bridge: capture py_thread_id at prediction start, use it for
  cancel_sync_thread(); async cancel still uses future.cancel()
- Tests: 5 new Python cancel tests (explicit cancel sync/async,
  connection-drop), 2 txtar integration tests
# Conflicts:
#	crates/coglet/src/supervisor.rs
#	crates/coglet/src/transport/http/routes.rs
@tempusfrangit tempusfrangit requested a review from a team as a code owner February 24, 2026 01:02
@tempusfrangit tempusfrangit added this to the 0.17.0 Release milestone Feb 24, 2026
…d cancelled

PyThreadState_SetAsyncExc injects CancelationException into the Python
thread, but predict_worker catches it as a generic PyErr and returns
PredictionError::Failed. After predict_worker returns, check if the
slot was marked cancelled and upgrade to PredictionError::Cancelled.

Also applies the same fix to the sync train path.
…g SDK

CancelationException now derives from BaseException (not Exception) so
that bare `except Exception` blocks in user predict code cannot
swallow cancellation — matching KeyboardInterrupt and CancelledError.

Key changes:
- cancel.rs: use pyo3_stub_gen::create_exception! to define a static
  BaseException subclass, replacing the dynamic builtins.type() approach
  and the OnceLock<Py<PyAny>> storage
- lib.rs: register CancelationException on the coglet._impl module
- coglet/__init__.py: re-export CancelationException
- cog/exceptions.py: new module re-exporting from coglet with a
  pure-Python fallback for SDK-only environments
- cog/__init__.py: expose as cog.CancelationException
- stub_gen.rs: include CancelationException in public re-exports
- Regenerated stubs via mise run generate:stubs
PyThreadState_SetAsyncExc replaced SIGUSR1 as the sync cancel mechanism
but the old handler, CANCELABLE flag, CancelableGuard, and
enter_cancelable() were left behind. Nothing sends SIGUSR1 and nothing
reads the CANCELABLE flag on the cancel path.

Removed:
- _sigusr1_handler, install_signal_handler, CANCELABLE, CancelableGuard,
  enter_cancelable, is_cancelable from cancel.rs
- enter_cancelable() calls from worker_bridge.rs and predictor.rs
- install_signal_handler() call from lib.rs worker init
- _is_cancelable Python getter from Server

Also updated is_cancelation_exception in predictor.rs to use the static
cancel::CancelationException type instead of a dynamic import from the
nonexistent cog.server.exceptions module.
…glet

The pure-Python fallback CancelationException conflicted with pyright
when coglet was installed (two incompatible types with the same name).
Since coglet is always present at runtime, just re-export directly.
- python.md: add Cancellation section with CancelationException usage,
  import paths, BaseException semantics, and re-raise requirement
- http.md: update cancel endpoint docs to use cog.CancelationException
  import path (was cog.server.exceptions) and link to new SDK docs
- Regenerated llms.txt
Async predictors receive asyncio.CancelledError instead. Added a table
to python.md making the distinction clear, and updated http.md to
mention both exception types.
The typecheck in lint-python needs the coglet wheel to resolve
CancelationException (now exported from coglet). Without the local
wheel, nox falls back to PyPI which doesn't have unreleased changes.

Adds build-rust as an optional dependency and downloads the
CogletRustWheel artifact when the rust build succeeded, matching
the pattern already used by test-coglet-python.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements end-to-end cancellation wiring from the HTTP layer to the worker subprocess, addressing issue #2748. The implementation replaces the previous SIGUSR1 signal-based approach with PyThreadState_SetAsyncExc for sync predictions, and makes CancelationException a proper public SDK export as a BaseException subclass.

Changes:

  • Wires cancellation through supervisor → orchestrator → worker using cancel_by_prediction_id() and ControlRequest::Cancel
  • Implements CancelationException as a static BaseException type (not Exception) exported through the public SDK
  • Refactors sync HTTP prediction handling to spawn work in background tasks, using SyncPredictionGuard for connection-drop cancellation
  • Removes dead SIGUSR1 signal handler code, CANCELABLE flag, and CancelableGuard
  • Adds comprehensive documentation and integration/unit tests

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/cog/exceptions.py New public exceptions module exporting CancelationException
python/cog/__init__.py Adds CancelationException to public API exports
integration-tests/tests/cancel_sync_prediction.txtar Integration test for sync prediction cancellation via HTTP endpoint
integration-tests/tests/cancel_async_prediction.txtar Integration test for async prediction cancellation via HTTP endpoint
docs/python.md New Cancellation section documenting CancelationException usage and semantics
docs/llms.txt Regenerated documentation with cancellation updates
docs/http.md Updated cancel endpoint documentation with correct import paths
crates/coglet/src/transport/http/routes.rs Refactors sync predictions to spawn in background tasks with SyncPredictionGuard
crates/coglet/src/supervisor.rs Implements cancel() method that delegates to orchestrator
crates/coglet/src/service.rs Sets orchestrator on supervisor for cancel delegation
crates/coglet/src/orchestrator.rs Adds cancel_by_prediction_id() trait method and event loop cancel handler
crates/coglet-python/tests/test_coglet.py Adds Python unit tests for sync/async cancellation and connection drops
crates/coglet-python/src/worker_bridge.rs Captures py_thread_id at prediction start, implements cancel via PyThreadState_SetAsyncExc
crates/coglet-python/src/predictor.rs Updates is_cancelation_exception to use static type
crates/coglet-python/src/lib.rs Exports CancelationException and removes dead signal handler code
crates/coglet-python/src/cancel.rs Complete rewrite: implements PyThreadState_SetAsyncExc-based cancellation, removes SIGUSR1
crates/coglet-python/src/bin/stub_gen.rs Adds CancelationException to public re-exports
crates/coglet-python/coglet/_impl.pyi Generated stub with CancelationException class definition
crates/coglet-python/coglet/__init__.pyi Generated stub re-exporting CancelationException
crates/coglet-python/coglet/__init__.py Exports CancelationException from _impl
.github/workflows/ci.yaml Adds coglet wheel download to lint-python job with proper dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +3
from coglet._impl import CancelationException as CancelationException, __build__ as __build__, __version__ as __version__, server as server

from coglet._impl import __build__ as __build__, __version__ as __version__, server as server, _sdk as _sdk

__all__ = ['__build__', '__version__', 'server']
__all__ = ['__version__', '__build__', 'server', 'CancelationException']
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated stub file is missing the header comments that stub_gen.rs claims to add. According to stub_gen.rs lines 54-56, the generated file should start with:

# This file is automatically generated by stub_gen
# ruff: noqa: E501, F401

However, the actual file doesn't have these comments. This suggests either the stub generation wasn't run after the changes, or there's an issue with the generation logic. This file should be regenerated to match the stub_gen configuration.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cancellation is not wired up in the new cog implementation

1 participant