Add comprehensive tests for SSL connection error handling during node failures#695
Draft
Add comprehensive tests for SSL connection error handling during node failures#695
Conversation
Tests simulate various SSL connection failure scenarios including: - Bad file descriptor errors during send/recv - SSL handshake failures - Concurrent operations on closing connections - Node reboot scenarios with connection reset - Multiple socket error types (EPIPE, ECONNABORTED, ENOTCONN, etc.) All tests verify that the original error cause is preserved in last_error and included in ConnectionShutdown exception messages for better debugging. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Created comprehensive integration tests to verify SSL connection error handling in real-world scenarios like node reboots and network failures. Tests cover: - AsyncoreConnection SSL failures - AsyncioConnection SSL failures - Error message quality verification - Concurrent operations during connection closure Note: These tests require a cluster to run but serve as documentation for expected behavior and can be used when cluster is available. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
- Refactored mock request creation to reduce duplication - Added comments explaining intentional use of CERT_NONE in tests - Improved code maintainability and clarity Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Created detailed README documenting: - Background on the "Bad file descriptor" issue - Test structure and organization - How to run the tests - Implementation details - Contributing guidelines Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix driver bad file descriptor error during cluster connection
Add comprehensive tests for SSL connection error handling during node failures
Feb 12, 2026
| """Create a mock SSL connection for testing.""" | ||
| c = Connection(DefaultEndPoint('1.2.3.4'), **kwargs) | ||
| c._socket = Mock(spec=ssl.SSLSocket) | ||
| c._socket.send.side_effect = lambda x: len(x) |
| """Create a connection with pending requests.""" | ||
| c = Connection(DefaultEndPoint('1.2.3.4')) | ||
| c._socket = Mock(spec=ssl.SSLSocket) | ||
| c._socket.send.side_effect = lambda x: len(x) |
|
|
||
| import unittest | ||
| import ssl | ||
| import socket |
| import ssl | ||
| import socket | ||
| import errno | ||
| from threading import Thread, Event |
| import socket | ||
| import errno | ||
| from threading import Thread, Event | ||
| from unittest.mock import Mock, MagicMock, patch, PropertyMock |
| import errno | ||
| from threading import Thread, Event | ||
| from unittest.mock import Mock, MagicMock, patch, PropertyMock | ||
| from io import BytesIO |
Comment on lines
+31
to
+34
| from cassandra.connection import ( | ||
| Connection, ConnectionShutdown, DefaultEndPoint, | ||
| ConnectionException | ||
| ) |
| """ | ||
|
|
||
| import unittest | ||
| import socket |
| import socket | ||
| import ssl | ||
| import errno | ||
| import threading |
| import ssl | ||
| import errno | ||
| import threading | ||
| import time |
fruch
reviewed
Feb 12, 2026
| from cassandra.connection import ConnectionShutdown, DefaultEndPoint | ||
| from cassandra.cluster import NoHostAvailable | ||
|
|
||
| try: |
There was a problem hiding this comment.
@copilot drop AsyncoreConnection tests, it's not supported anymore in newer python versions
hence preventing those tests from running (or even breaks collection)
Author
There was a problem hiding this comment.
Removed all AsyncoreConnection tests as requested. The asyncore module is deprecated in Python 3.6 and removed in Python 3.12+, which was preventing test collection.
Changes in commit 3cdee85:
- Removed AsyncoreConnection import and all 5 AsyncoreConnection tests
- Updated SSLErrorMessageQualityTest to only use AsyncioConnection
- Updated documentation to reflect 4 integration tests (down from 8)
The 14 unit tests remain unchanged and provide comprehensive coverage for SSL error scenarios.
Removed AsyncoreConnection from integration tests as asyncore module is deprecated in Python 3.6 and removed in Python 3.12+. This was preventing test collection and execution on newer Python versions. Changes: - Removed AsyncoreConnection import - Removed AsyncoreSSLConnectionFailureTest class (5 tests) - Updated SSLErrorMessageQualityTest to only use AsyncioConnection - Updated documentation to reflect 4 integration tests instead of 8 The unit tests (14 tests) remain unchanged and provide comprehensive coverage for SSL error handling scenarios. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
When nodes reboot while using client encryption,
ConnectionShutdownexceptions with "Bad file descriptor" messages obscured the root cause. The driver already preserves the original error inlast_errorand includes it in exception messages - this was just untested.Changes:
Unit tests (
tests/unit/test_ssl_connection_errors.py): 14 tests covering SSL socket failuresIntegration tests (
tests/integration/standard/test_ssl_connection_failures.py): 4 tests for AsyncioConnectionDocumentation (
tests/unit/SSL_CONNECTION_TESTS.md): Test organization and contributing guidelinesVerification:
No driver code changes needed - existing error handling is correct.
Pre-review checklist
./docs/source/.Original prompt
This section details on the original issue you should resolve
<issue_title>Driver reported "[Errno 9] Bad file descriptor"</issue_title>
<issue_description>
Argus
Scylla version:
2026.1.0~dev-20251205.866c96f536b0with build-id2c38506085b888e1baa43f81d05dab12df5132c1During latest master runs driver reported following error:
It seems that such errors appeared each time while one of nodes been down
Also been spotted there:
https://argus.scylladb.com/tests/scylla-cluster-tests/a8cd6873-19c1-49c1-ab5a-dca25655ed6c
Kernel Version:
6.14.0-1017-awsExtra information
Installation details
Cluster size: 6 nodes (i7i.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0810c73586fe68036(aws: N/A)Test:
longevity-50gb-3days-testTest id:
38f90182-547d-4b60-973c-7e826b926708Test name:
scylla-master/tier1/longevity-50gb-3days-testTest method:
longevity_test.LongevityTest.test_custom_timeTest config file(s):
Logs:
longevity-tls-50gb-3d-master-db-node-38f90182-7
longevity-tls-50gb-3d-master-db-node-38f90182-3
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-36-03.zst
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-39-47.zst
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-43-23.zst
db-cluster-38f90182.tar.zst
**[schema-logs-38f90182.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-clu...
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.