-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Summary
The Audit Review of PR #1414 detected an inconsistency in object-storage cluster task identity handling.
Normal scheduling derives task identity using getIdentifier() (or archive-aware path), while StorageObjectStorageStableTaskDistributor::rescheduleTasksFromReplica() derives task identity only from getPath().
This mismatch can collapse distinct tasks (especially bucket-split tasks) into the same key and lead to task loss or incorrect reassignment after replica failure.
Affected area
src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp- Function:
StorageObjectStorageStableTaskDistributor::rescheduleTasksFromReplica() - Observed during QA of PR: 26.1 Antalya port - improvements for cluster requests #1414
Impact
High (correctness/reliability in failure path):
- Potential task drop when replica fails before producing data.
- Potential incomplete reads or incorrect task distribution for
s3Cluster/icebergClusterstyle workflows. - Higher risk when file-bucket splitting is enabled or archive distribution modes are used.
Code evidence
Reschedule path (task identity derived only from getPath()):
for (const auto & file : processed_file_list_ptr->second)
{
auto file_replica_idx = getReplicaForFile(file->getPath());
unprocessed_files.emplace(file->getPath(), std::make_pair(file, file_replica_idx));
connection_to_files[file_replica_idx].push_back(file);
}Normal scheduling path (identifier-aware identity):
String file_identifier;
if (send_over_whole_archive && object_info->isArchive())
file_identifier = object_info->getPathOrPathToArchiveIfArchive();
else
file_identifier = object_info->getIdentifier();And getIdentifier() includes bucket identity:
String ObjectInfo::getIdentifier() const
{
String result = getPath();
if (file_bucket_info)
result += file_bucket_info->getIdentifier();
return result;
}Reproduction sketch
- Run a distributed object-storage read where a single file is split into multiple bucket tasks.
- Let one replica receive tasks and fail before emitting any data packet.
- Trigger the rescheduling path.
- Observe that multiple bucket tasks sharing the same
getPath()can collide inunprocessed_files.emplace(...), resulting in dropped tasks and/or incorrect redistribution.
Expected behavior
Rescheduling should use the same task identity logic as initial scheduling (same identifier granularity), so no task is collapsed or lost.
Suggested fix direction
In rescheduleTasksFromReplica(), derive the key and hashing input using the same rules as enqueue/scheduling path:
getIdentifier()for normal mode- archive-aware identifier logic where applicable
Suggested regression test
Add a failure-injection test for object-storage cluster reads:
- enable file-bucket splitting,
- fail a replica before first data packet,
- verify all bucket tasks are eventually processed exactly once (no missing/duplicate buckets).