Conversation
Cancel was accepted by the HTTP layer but never reached the worker. This wires the full path: HTTP cancel/connection-drop → supervisor → orchestrator → ControlRequest::Cancel → worker → PyThreadState_SetAsyncExc. Key changes: - Orchestrator: add cancel_by_prediction_id (resolves pred ID → slot ID) - Supervisor: cancel() delegates to orchestrator via spawned task - SyncPredictionGuard: calls supervisor.cancel() on drop (not just token) - Sync HTTP handler: spawns prediction in background task so slot lifetime is not tied to the HTTP connection (fixes permit leak on disconnect) - cancel.rs: replace SIGUSR1 with PyThreadState_SetAsyncExc for sync cancel (works on any thread, not just main); fix fallback exception to create a proper class, not an instance - worker_bridge: capture py_thread_id at prediction start, use it for cancel_sync_thread(); async cancel still uses future.cancel() - Tests: 5 new Python cancel tests (explicit cancel sync/async, connection-drop), 2 txtar integration tests
# Conflicts: # crates/coglet/src/supervisor.rs # crates/coglet/src/transport/http/routes.rs
…d cancelled PyThreadState_SetAsyncExc injects CancelationException into the Python thread, but predict_worker catches it as a generic PyErr and returns PredictionError::Failed. After predict_worker returns, check if the slot was marked cancelled and upgrade to PredictionError::Cancelled. Also applies the same fix to the sync train path.
…g SDK CancelationException now derives from BaseException (not Exception) so that bare `except Exception` blocks in user predict code cannot swallow cancellation — matching KeyboardInterrupt and CancelledError. Key changes: - cancel.rs: use pyo3_stub_gen::create_exception! to define a static BaseException subclass, replacing the dynamic builtins.type() approach and the OnceLock<Py<PyAny>> storage - lib.rs: register CancelationException on the coglet._impl module - coglet/__init__.py: re-export CancelationException - cog/exceptions.py: new module re-exporting from coglet with a pure-Python fallback for SDK-only environments - cog/__init__.py: expose as cog.CancelationException - stub_gen.rs: include CancelationException in public re-exports - Regenerated stubs via mise run generate:stubs
PyThreadState_SetAsyncExc replaced SIGUSR1 as the sync cancel mechanism but the old handler, CANCELABLE flag, CancelableGuard, and enter_cancelable() were left behind. Nothing sends SIGUSR1 and nothing reads the CANCELABLE flag on the cancel path. Removed: - _sigusr1_handler, install_signal_handler, CANCELABLE, CancelableGuard, enter_cancelable, is_cancelable from cancel.rs - enter_cancelable() calls from worker_bridge.rs and predictor.rs - install_signal_handler() call from lib.rs worker init - _is_cancelable Python getter from Server Also updated is_cancelation_exception in predictor.rs to use the static cancel::CancelationException type instead of a dynamic import from the nonexistent cog.server.exceptions module.
…glet The pure-Python fallback CancelationException conflicted with pyright when coglet was installed (two incompatible types with the same name). Since coglet is always present at runtime, just re-export directly.
- python.md: add Cancellation section with CancelationException usage, import paths, BaseException semantics, and re-raise requirement - http.md: update cancel endpoint docs to use cog.CancelationException import path (was cog.server.exceptions) and link to new SDK docs - Regenerated llms.txt
Async predictors receive asyncio.CancelledError instead. Added a table to python.md making the distinction clear, and updated http.md to mention both exception types.
The typecheck in lint-python needs the coglet wheel to resolve CancelationException (now exported from coglet). Without the local wheel, nox falls back to PyPI which doesn't have unreleased changes. Adds build-rust as an optional dependency and downloads the CogletRustWheel artifact when the rust build succeeded, matching the pattern already used by test-coglet-python.
There was a problem hiding this comment.
Pull request overview
This PR implements end-to-end cancellation wiring from the HTTP layer to the worker subprocess, addressing issue #2748. The implementation replaces the previous SIGUSR1 signal-based approach with PyThreadState_SetAsyncExc for sync predictions, and makes CancelationException a proper public SDK export as a BaseException subclass.
Changes:
- Wires cancellation through supervisor → orchestrator → worker using
cancel_by_prediction_id()andControlRequest::Cancel - Implements
CancelationExceptionas a staticBaseExceptiontype (notException) exported through the public SDK - Refactors sync HTTP prediction handling to spawn work in background tasks, using
SyncPredictionGuardfor connection-drop cancellation - Removes dead SIGUSR1 signal handler code,
CANCELABLEflag, andCancelableGuard - Adds comprehensive documentation and integration/unit tests
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
python/cog/exceptions.py |
New public exceptions module exporting CancelationException |
python/cog/__init__.py |
Adds CancelationException to public API exports |
integration-tests/tests/cancel_sync_prediction.txtar |
Integration test for sync prediction cancellation via HTTP endpoint |
integration-tests/tests/cancel_async_prediction.txtar |
Integration test for async prediction cancellation via HTTP endpoint |
docs/python.md |
New Cancellation section documenting CancelationException usage and semantics |
docs/llms.txt |
Regenerated documentation with cancellation updates |
docs/http.md |
Updated cancel endpoint documentation with correct import paths |
crates/coglet/src/transport/http/routes.rs |
Refactors sync predictions to spawn in background tasks with SyncPredictionGuard |
crates/coglet/src/supervisor.rs |
Implements cancel() method that delegates to orchestrator |
crates/coglet/src/service.rs |
Sets orchestrator on supervisor for cancel delegation |
crates/coglet/src/orchestrator.rs |
Adds cancel_by_prediction_id() trait method and event loop cancel handler |
crates/coglet-python/tests/test_coglet.py |
Adds Python unit tests for sync/async cancellation and connection drops |
crates/coglet-python/src/worker_bridge.rs |
Captures py_thread_id at prediction start, implements cancel via PyThreadState_SetAsyncExc |
crates/coglet-python/src/predictor.rs |
Updates is_cancelation_exception to use static type |
crates/coglet-python/src/lib.rs |
Exports CancelationException and removes dead signal handler code |
crates/coglet-python/src/cancel.rs |
Complete rewrite: implements PyThreadState_SetAsyncExc-based cancellation, removes SIGUSR1 |
crates/coglet-python/src/bin/stub_gen.rs |
Adds CancelationException to public re-exports |
crates/coglet-python/coglet/_impl.pyi |
Generated stub with CancelationException class definition |
crates/coglet-python/coglet/__init__.pyi |
Generated stub re-exporting CancelationException |
crates/coglet-python/coglet/__init__.py |
Exports CancelationException from _impl |
.github/workflows/ci.yaml |
Adds coglet wheel download to lint-python job with proper dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from coglet._impl import CancelationException as CancelationException, __build__ as __build__, __version__ as __version__, server as server | ||
|
|
||
| from coglet._impl import __build__ as __build__, __version__ as __version__, server as server, _sdk as _sdk | ||
|
|
||
| __all__ = ['__build__', '__version__', 'server'] | ||
| __all__ = ['__version__', '__build__', 'server', 'CancelationException'] |
There was a problem hiding this comment.
The generated stub file is missing the header comments that stub_gen.rs claims to add. According to stub_gen.rs lines 54-56, the generated file should start with:
# This file is automatically generated by stub_gen
# ruff: noqa: E501, F401
However, the actual file doesn't have these comments. This suggests either the stub generation wasn't run after the changes, or there's an issue with the generation logic. This file should be regenerated to match the stub_gen configuration.
Summary
Wire cancellation end-to-end from HTTP to worker subprocess, and make
CancelationExceptiona proper public API.Cancel was accepted by the HTTP layer but never reached the worker. This wires the full path: HTTP cancel/connection-drop → supervisor → orchestrator → ControlRequest::Cancel → worker → PyThreadState_SetAsyncExc.
Key changes
Cancellation wiring (existing commits)
CancelationException as BaseException + public SDK export
CancelationExceptionnow derives fromBaseException(notException) so bareexcept Exceptionblocks cannot swallow cancellation — matchingKeyboardInterruptandasyncio.CancelledErrorpyo3_stub_gen::create_exception!(replaces dynamicbuiltins.type()approach)coglet._impl→coglet→cog.exceptions→cog.CancelationExceptionclass CancelationException(BaseException)definitionDead code removal
CANCELABLEflag,CancelableGuard,enter_cancelable()— all dead sincePyThreadState_SetAsyncExcreplaced SIGUSR1_is_cancelablePython getter from Serveris_cancelation_exceptionto use the static type instead of dynamic import from nonexistentcog.server.exceptionsDocumentation
docs/python.mddocumentingCancelationExceptionusage, import paths,BaseExceptionsemantics, and re-raise requirementdocs/http.mdcancel endpoint docs to usecog.CancelationExceptionimport pathdocs/llms.txtTests
closes: #2748