Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/fix-batch-duplicate-idempotency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@trigger.dev/webapp": patch
---

Fix batchTriggerAndWait running forever when duplicate idempotencyKey is provided in the same batch

When using batchTriggerAndWait with duplicate idempotencyKeys in the same batch, the batch would never complete because the completedCount and expectedCount would be mismatched. This fix ensures that cached runs (duplicate idempotencyKeys) are properly tracked in the batch, with their completedCount incremented immediately if the cached run is already in a final status.
93 changes: 79 additions & 14 deletions apps/webapp/app/v3/services/batchTriggerV3.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,11 @@ export class BatchTriggerV3Service extends BaseService {

const existingBatch = options.idempotencyKey
? await this._prisma.batchTaskRun.findFirst({
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
: undefined;

if (existingBatch) {
Expand Down Expand Up @@ -167,16 +167,16 @@ export class BatchTriggerV3Service extends BaseService {

const dependentAttempt = body?.dependentAttempt
? await this._prisma.taskRunAttempt.findFirst({
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
},
})
},
})
: undefined;

if (
Expand Down Expand Up @@ -890,7 +890,72 @@ export class BatchTriggerV3Service extends BaseService {
}
}

return false;
// FIX for Issue #2965: When a run is cached (duplicate idempotencyKey),
// we need to ALWAYS create a BatchTaskRunItem to properly track it.
// This handles cases where cached run may originate from another batch.
// Use unique constraint (batchTaskRunId, taskRunId) to prevent duplicates.
const isAlreadyComplete = isFinalRunStatus(result.run.status);

logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] Cached run detected, creating batch item",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
cachedRunStatus: result.run.status,
isAlreadyComplete,
currentIndex,
}
);

// Always create BatchTaskRunItem for cached runs
// This ensures proper tracking even for cross-batch scenarios
try {
await this._prisma.batchTaskRunItem.create({
data: {
batchTaskRunId: batch.id,
taskRunId: result.run.id,
// Use batchTaskRunItemStatusForRunStatus() for all cases
// This correctly maps both successful (COMPLETED) and failed (FAILED) statuses
status: batchTaskRunItemStatusForRunStatus(result.run.status),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Cached runs with failed status are not counted for batch completion

When a cached run has a failed status (e.g., COMPLETED_WITH_ERRORS, CRASHED, SYSTEM_FAILURE), the batch will never complete.

Click to expand

Root Cause

The fix creates a BatchTaskRunItem with status based on batchTaskRunItemStatusForRunStatus(result.run.status) (line 920). For failed run statuses, this returns FAILED (see taskRun.server.ts:119-126):

case TaskRunStatus.COMPLETED_WITH_ERRORS:
case TaskRunStatus.SYSTEM_FAILURE:
case TaskRunStatus.CRASHED:
  return BatchTaskRunItemStatus.FAILED;

However, tryCompleteBatchV3 only counts items with status: "COMPLETED" (line 1034-1035):

const completedCount = await tx.batchTaskRunItem.count({
  where: { batchTaskRunId: batchId, status: "COMPLETED" },
});

Actual vs Expected

Actual: For a cached run with COMPLETED_WITH_ERRORS status:

  1. BatchTaskRunItem is created with status FAILED
  2. isAlreadyComplete is true (line 897) since it's a final status
  3. But tryCompleteBatchV3 only counts COMPLETED items, missing this item
  4. The batch never completes because count won't reach expectedCount

Expected: The batch should complete when all items have finished, regardless of whether the cached runs succeeded or failed.

Impact

This partially defeats the purpose of the fix - batchTriggerAndWait will still run forever if duplicate idempotency keys reference runs that have already failed.

Recommendation: For cached runs that are already complete (regardless of success/failure), create the BatchTaskRunItem with status COMPLETED instead of using batchTaskRunItemStatusForRunStatus(). This aligns with how completeBatchTaskRunItemV3 works - it always sets status to COMPLETED when a run finishes (line 1088).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

},
});

// Only increment completedCount if the cached run is already finished
// For in-progress runs, completedCount will be incremented when the run completes
if (isAlreadyComplete) {
await this._prisma.batchTaskRun.update({
where: { id: batch.id },
data: {
completedCount: {
increment: 1,
},
},
});
}

// Return true so expectedCount is incremented
return true;
} catch (error) {
if (isUniqueConstraintError(error, ["batchTaskRunId", "taskRunId"])) {
// BatchTaskRunItem already exists for this batch and cached run
// This can happen if the same idempotencyKey is used multiple times in the same batch
logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] BatchTaskRunItem already exists for cached run",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
currentIndex,
}
);

// Don't increment expectedCount since this item is already tracked
return false;
}

throw error;
}
Comment on lines +893 to +958
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Account for FAILED cached runs in batch completion.
Cached runs in a final failed status now create items with status FAILED (Line 920) and will never transition to COMPLETED because completeBatchTaskRunItemV3 only updates PENDING. tryCompleteBatchV3 only counts COMPLETED, so a cached failed run can keep batchTriggerAndWait from ever completing.

Consider counting FAILED as completed (or normalizing cached final statuses to COMPLETED for completion tracking).

🔧 Suggested fix (count FAILED as completed)
-  const completedCount = await tx.batchTaskRunItem.count({
-    where: { batchTaskRunId: batchId, status: "COMPLETED" },
-  });
+  const completedCount = await tx.batchTaskRunItem.count({
+    where: {
+      batchTaskRunId: batchId,
+      status: { in: ["COMPLETED", "FAILED"] },
+    },
+  });
🤖 Prompt for AI Agents
In `@apps/webapp/app/v3/services/batchTriggerV3.server.ts` around lines 893 - 958,
The cached-run handling creates BatchTaskRunItem with status from
batchTaskRunItemStatusForRunStatus(result.run.status) which can be FAILED and
never transitions to COMPLETED (completeBatchTaskRunItemV3 only updates PENDING)
so tryCompleteBatchV3 that counts only COMPLETED will never finish; fix by
treating final FAILED runs as completed for batch completion: either normalize
final cached statuses to COMPLETED (map FAILED -> COMPLETED) when creating the
item via batchTaskRunItemStatusForRunStatus(result.run.status) or update the
creation/aggregation logic to increment completedCount (or have
tryCompleteBatchV3 count FAILED alongside COMPLETED). Adjust code paths around
this._prisma.batchTaskRunItem.create, isFinalRunStatus,
batchTaskRunItemStatusForRunStatus, completeBatchTaskRunItemV3 and
tryCompleteBatchV3 accordingly and preserve unique-constraint handling
(isUniqueConstraintError).

}

async #enqueueBatchTaskRun(options: BatchProcessingOptions) {
Expand Down