Closed
Conversation
*: initial scaffolding for the stickydisk action
README: update README
main: maintain runner perms when mounting a stickydisk
README: update README with use cases and arch diagram
README: fix nits
Update README.md - use blacksmith cache action
src: add sync before umount
src: increase timeout to the same as build-push-action
* src: explicitly flush journal before umounting * *: generated code * *: upgrade @buf/blacksmith_vm-agent.connectrpc_es@latest
src: print who we are establishing client with
src: use BLACKSMITH prefixed env vars to work inside job containers
Fixes critical bug where filesystem usage was collected after unmounting, causing df to report incorrect data. Changes: - Moved df command execution to before unmount - Added proper path escaping to prevent shell injection - Fixed log message syntax error and unit label (GB -> GiB) - Added prettier config for consistent formatting - Removed unsupported eslint comma-dangle rule Co-authored-by: Aayush Shah <aayushshah15@users.noreply.github.com>
- Add comprehensive stickydisk-delete documentation section - Include basic usage and cleanup workflow examples - Add pattern matching and use case documentation - Maintain consistent formatting with Blacksmith logo
- Remove pattern matching (not supported) - Remove logo and overview fluff - Focus on two supported methods: delete by key and Docker cache - Simplify examples
Rebuild dist files with Node v23.2.0 to match .nvmrc version. Build artifacts may differ slightly between macOS and Linux CI environment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Rebuild dist files with Node v23.2.0 to match .nvmrc version. Build artifacts may differ slightly between macOS and Linux CI environment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 00dbeb0.
Update @buf/blacksmith_vm-agent.connectrpc_es version to match CI requirements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
docs: add stickydisk-delete section to README
…on errors Remove the lost+found directory created by mkfs.ext4 to prevent EACCES permission errors when tools recursively scan sticky disk mount paths. Problem: mkfs.ext4 always creates lost+found with root:root 0700 permissions for fsck recovery. When sticky disks are mounted to paths that tools scan recursively (e.g., ./node_modules, ./build-cache), permission errors occur: - pnpm/npm/yarn: Scan node_modules for packages, hit lost+found Error: EACCES: permission denied, open '.../lost+found/package.json' - Docker buildx: Scans build context, hits lost+found Error: error from sender: open build-cache/lost+found: permission denied Impact Analysis: - 4 customer installations affected in past 11 days - Pattern 1: pnpm/npm/yarn with node_modules mount - Pattern 2: Docker buildx with build-cache mount - Error is intermittent for some tools (buildx) - Persistent for others (pnpm/yarn) Evidence: - Direct snapshot inspection shows empty lost+found (no corruption) - This is standard mkfs.ext4 behavior, not a bug - Workarounds already implemented by affected users Solution: After mkfs.ext4, mount temporarily and remove lost+found directory. This is safe because: - Sticky disks are ephemeral CI caches - If corruption occurs, cache can be rebuilt - lost+found only needed for fsck recovery of critical filesystems - Standard practice in Docker/K8s for cache volumes Prevents need for per-repo workarounds. Fixes BLA-2150
Add minimal placeholder test since full integration testing of block device operations requires actual hardware and is done manually.
…g diffs The CI was installing @buf/blacksmith_vm-agent.connectrpc_es@latest which would update package-lock.json, causing prettier to detect formatting changes. Since npm ci already installs the correct version from package-lock.json, the explicit install is unnecessary and causes issues. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…-mkfs fix: remove lost+found directory after formatting to prevent permissi…
Replace deprecated Blacksmith-specific actions with upstream equivalents and add real-world use case examples based on production data from 58+ installations with 2,000+ active sticky disk entities. Changes to existing examples: - NPM caching: Use actions/setup-node@v4 (not deprecated useblacksmith/setup-node@v5) - Updated runner labels to specific images (blacksmith-4vcpu-ubuntu-2204) - Removed node_modules mount (causes lost+found permission issues) - Focus on npm cache (~/.npm) which is the safer pattern New use cases added: - Go Build and Module Cache (4 installations, ~1TB) - Turborepo Cache (4 installations, ~610GB) - Python Virtual Environments (2 installations, ~566GB) - Nix Package Cache (2 installations, ~29GB) - Playwright Browser Binaries (5 installations, ~36GB) All examples: - Use upstream actions (actions/setup-node, actions/setup-go, etc.) - Based on real customer workflows - Anonymized for privacy - Production-validated patterns
Replace deprecated Blacksmith-specific actions with upstream equivalents and add real-world use case examples based on production data from 58+ installations with 2,000+ active sticky disk entities. Changes to existing examples: - NPM: Use actions/setup-node@v4 (not deprecated useblacksmith/setup-node@v5) - NPM: Added node_modules mount back (safe after lost+found fix) - Updated all runner labels to specific images (blacksmith-4vcpu-ubuntu-2204) - Bazel: Updated runner label for consistency New use cases added: - Go Build and Module Cache (4 installations, ~1TB) - Turborepo Cache (4 installations, ~610GB) - Python Virtual Environments (2 installations, ~566GB) - Nix Package Cache (2 installations, ~29GB) - Playwright Browser Binaries (5 installations, ~36GB) All examples: - Use upstream actions (no deprecated Blacksmith actions) - Based on real customer workflows - Anonymized for privacy - Production-validated patterns
Fix Nix example to match production workflow pattern: 1. Create /nix directories first 2. Mount sticky disk to /nix 3. THEN install Nix (which populates the mounted directory) Previous order (mount after install) would cause the sticky disk to overwrite the Nix installation. Also updated to use nixbuild/nix-quick-install-action@v30 which is what production workflows actually use.
…production-data docs/update use cases with examples from production uses
Update status from Alpha to Beta to reflect: - 58+ installations in production - 2,000+ active sticky disk entities - Stable API with no breaking changes planned - Recent fixes for edge cases (lost+found issue) - Production-validated across diverse use cases The action is stable and ready for broader adoption.
chore: promote stickydisk action from Alpha to Beta
BLA-2526: This change updates the documentation to reflect the new default limit of 10 sticky disks per GitHub Action job (increased from 5). Changes: - Updated README.md line 29 from 'up to 5 sticky disks' to 'up to 10 sticky disks' Note: When Docker pull caching is enabled, it reserves one slot, so customers will effectively have 9 stickydisk slots available in that configuration. Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
Add blockdev --flushbufs operation on guest side after unmounting the sticky
disk to ensure data durability before Ceph RBD snapshots are taken.
Changes:
- Add getDeviceFromMount() to extract device path from mount point
- Add flushBlockDevice() that runs blockdev --flushbufs with stats logging
- Log I/O stats from /sys/block/{device}/stat before and after flush
- Add ENABLE_DURABILITY_FLUSH env var for feature flag (defaults to enabled)
- Handle errors gracefully - log warnings but don't fail the cleanup flow
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
Co-Authored-By: maru@blacksmith.sh <adityamaru@gmail.com>
docs: BLA-2526 - Update stickydisk limit from 5 to 10 disks
…urability-flush feat(post): add explicit durability flush after unmount (BLA-3202)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Medium Risk
Medium risk because it changes the published action interface in
action.yml(new required inputs and adds apostscript), which can break existing consumers; the rest is CI/docs scaffolding.Overview
This PR repackages the repository from a cache-delete action into the
useblacksmith/stickydiskaction by updatingaction.yml(requiredkey/pathinputs and apoststep) and rewriting the README to document sticky disk behavior and common caching use cases.It also adds CI/ops scaffolding: a
basic.yamlworkflow to exercise mounting multiple sticky disks, abump-tag.yamlworkflow to force-update thev1tag, and tighter build CI checks (Buf setup/registry auth plus enforced Prettier + “no uncommitted build output” gating).Written by Cursor Bugbot for commit 94119f5. This will update automatically on new commits. Configure here.