Skip to content

fix: check for existing buildkitd before mounting sticky disk#65

Open
chadxz wants to merge 1 commit intouseblacksmith:mainfrom
chadxz:fix-double-setup-error
Open

fix: check for existing buildkitd before mounting sticky disk#65
chadxz wants to merge 1 commit intouseblacksmith:mainfrom
chadxz:fix-double-setup-error

Conversation

@chadxz
Copy link

@chadxz chadxz commented Mar 10, 2026

When setup-docker-builder is invoked twice in the same job (e.g. via a composite action called twice), the second invocation was calling setupStickyDisk() before detecting the already-running buildkitd. This caused a new sticky disk to be mounted on top of /var/lib/buildkit while buildkitd was still running with in-memory metadata referencing snapshot directories from the original disk. The subsequent build then failed with:

ERROR: failed to solve: failed to read dockerfile: failed to walk:
resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N:
no such file or directory

Fix: move the buildkitd process check to the very beginning of startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is already running, log an informational message and return immediately so the fallback path reuses the existing configured builder (from the first invocation) without corrupting its overlayfs snapshot state.


Note

Medium Risk
Changes builder initialization and sticky-disk mounting order, which can impact build reliability in CI if the buildkitd detection behaves unexpectedly in edge cases.

Overview
Prevents double-invocation breakage by moving the buildkitd process check to the start of startBlacksmithBuilder() and skipping all builder/sticky-disk setup when a buildkitd PID is already present.

Updates the fallback path to treat “setup skipped” the same as “setup failed”, attempting to reuse an already configured buildx builder (and only creating a local builder if none exists).

Written by Cursor Bugbot for commit 4e5e494. This will update automatically on new commits. Configure here.

When setup-docker-builder is invoked twice in the same job (e.g. via a
composite action called twice), the second invocation was calling
setupStickyDisk() before detecting the already-running buildkitd. This
caused a new sticky disk to be mounted on top of /var/lib/buildkit while
buildkitd was still running with in-memory metadata referencing snapshot
directories from the original disk. The subsequent build then failed with:

  ERROR: failed to solve: failed to read dockerfile: failed to walk:
  resolve: lstat /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/N:
  no such file or directory

Fix: move the buildkitd process check to the very beginning of
startBlacksmithBuilder(), before any sticky disk setup. If buildkitd is
already running, log an informational message and return immediately so
the fallback path reuses the existing configured builder (from the first
invocation) without corrupting its overlayfs snapshot state.
@chadxz
Copy link
Author

chadxz commented Mar 10, 2026

I don't have a buf token, so I wasn't able to do a pnpm install to rebuild the dist/index.ts. So if someone could help with that, would be good 👍

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

`Detected existing buildkitd process (PID: ${stdout.trim()}). ` +
`Skipping builder setup - builder is already initialized.`,
);
return { addr: null, exposeId: "" };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nofallback check bypassed when buildkitd already running

Low Severity

When buildkitd is already running, the early return { addr: null, exposeId: "" } bypasses the nofallback check in the catch block. The fallback path at line 643+ has no nofallback guard of its own, so if toolkit.builder.inspect() returns null, a local docker-container builder is silently created even when nofallback is true. Previously, detecting a running buildkitd threw an error that respected the nofallback flag.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Author

@chadxz chadxz Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, buildkitd is already running from a prior invocation in the same job - the Blacksmith builder setup already succeeded. There's nothing to "fall back" from, so I don't think nofallback applies here. We're reusing the existing (already working) builder, not necessarily falling back to a local one.

If we later enrich the toolkit.builder.inspect() check to validate that the existing builder is specifically a Blacksmith builder, it may make sense to wire in specific nofallback handling at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant