Skip to content

Object-storage task rescheduling uses inconsistent task identity, causing task loss/misrouting after replica failure #1486

@CarlosFelipeOR

Description

@CarlosFelipeOR

Summary

The Audit Review of PR #1414 detected an inconsistency in object-storage cluster task identity handling.

Normal scheduling derives task identity using getIdentifier() (or archive-aware path), while StorageObjectStorageStableTaskDistributor::rescheduleTasksFromReplica() derives task identity only from getPath().

This mismatch can collapse distinct tasks (especially bucket-split tasks) into the same key and lead to task loss or incorrect reassignment after replica failure.

Affected area

Impact

High (correctness/reliability in failure path):

  • Potential task drop when replica fails before producing data.
  • Potential incomplete reads or incorrect task distribution for s3Cluster / icebergCluster style workflows.
  • Higher risk when file-bucket splitting is enabled or archive distribution modes are used.

Code evidence

Reschedule path (task identity derived only from getPath()):

for (const auto & file : processed_file_list_ptr->second)
{
    auto file_replica_idx = getReplicaForFile(file->getPath());
    unprocessed_files.emplace(file->getPath(), std::make_pair(file, file_replica_idx));
    connection_to_files[file_replica_idx].push_back(file);
}

Normal scheduling path (identifier-aware identity):

String file_identifier;
if (send_over_whole_archive && object_info->isArchive())
    file_identifier = object_info->getPathOrPathToArchiveIfArchive();
else
    file_identifier = object_info->getIdentifier();

And getIdentifier() includes bucket identity:

String ObjectInfo::getIdentifier() const
{
    String result = getPath();
    if (file_bucket_info)
        result += file_bucket_info->getIdentifier();
    return result;
}

Reproduction sketch

  1. Run a distributed object-storage read where a single file is split into multiple bucket tasks.
  2. Let one replica receive tasks and fail before emitting any data packet.
  3. Trigger the rescheduling path.
  4. Observe that multiple bucket tasks sharing the same getPath() can collide in unprocessed_files.emplace(...), resulting in dropped tasks and/or incorrect redistribution.

Expected behavior

Rescheduling should use the same task identity logic as initial scheduling (same identifier granularity), so no task is collapsed or lost.

Suggested fix direction

In rescheduleTasksFromReplica(), derive the key and hashing input using the same rules as enqueue/scheduling path:

  • getIdentifier() for normal mode
  • archive-aware identifier logic where applicable

Suggested regression test

Add a failure-injection test for object-storage cluster reads:

  • enable file-bucket splitting,
  • fail a replica before first data packet,
  • verify all bucket tasks are eventually processed exactly once (no missing/duplicate buckets).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions