Skip to content

feat: block Pruning#2984

Open
pthmas wants to merge 26 commits intomainfrom
pierrick/prunning
Open

feat: block Pruning#2984
pthmas wants to merge 26 commits intomainfrom
pierrick/prunning

Conversation

@pthmas
Copy link
Contributor

@pthmas pthmas commented Jan 16, 2026

Overview

  • Adds a height-based pruning mechanism to the ev-node store:
    • Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
    • Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
    • Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
    • Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
  • Integrates pruning into the DA inclusion loop:
    • As DA inclusion advances, at configurable intervals, the submitter computes a prune target and calls store.PruneBlocks.
    • The prune target is capped by both:
      • the DA-included height (DAIncludedHeightKey), so only DA-included blocks are ever pruned, and
      • the current store height, to avoid pruning beyond what we actually have locally.
    • If there are not enough fully DA-included blocks beyond PruningKeepRecent, pruning is skipped.
  • Adds pruning for execution metadata:
    • Execution stores (EVM, ev-abci, etc.) expose PruneExecMeta to delete per-height ExecMeta entries up to the same target height.
    • The executor implements an optional ExecMetaPruner interface, and the submitter calls it from the DA inclusion pruning hook.
    • EV-ABCI PR: feat: pruning ev-abci#332
  • Optimizes header/data tail lookups:
    • The StoreAdapter’s Tail() now uses the last-pruned-block-height metadata to skip fully pruned ranges instead of linearly scanning from genesis.
    • This keeps tail lookups efficient even after long-term pruning.

Config

New node config fields / flags:

  • --evnode.node.pruning_enabled
  • --evnode.node.pruning_keep_recent
  • --evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

  • Best-effort pruning:
    If pruning fails (either store or exec metadata), we log the error but do not halt DA inclusion. This avoids stopping the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.

  • Scope:
    This PR prunes:

    • ev-node’s own block store (headers, data, signatures, hash index, and per-height DA metadata),
    • execution metadata via the ExecMetaPruner hook when available.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 4, 2026, 7:21 PM

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @pthmas's task in 2m 33s —— View job


Comprehensive Code Review Complete ✅

I've performed a comprehensive code review of the block pruning implementation focusing on code quality, security, error handling, performance, testing, and documentation.


Summary

This is a well-designed and thoughtfully implemented feature that adds height-based pruning to ev-node. The implementation follows Go best practices, includes proper error handling, and demonstrates good architectural decisions. However, there are some concerns related to error handling resilience, test coverage, and configuration validation that should be addressed.

Overall Assessment: The code is production-ready with minor improvements recommended.


Detailed Findings

1. Code Quality & Architecture ⭐⭐⭐⭐☆ (4/5)

Strengths:

  • Clean separation of concerns: Pruning logic is isolated in PruneBlocks() method with clear responsibilities
  • Idempotent design: Uses LastPrunedBlockHeightKey to track progress and prevent duplicate work
  • Modular interface design: ExecPruner interface allows optional pruning support across different execution layers
  • Consistent naming: Clear naming conventions (e.g., PruningKeepRecent, PruningInterval)
  • Proper use of constants: Well-defined keys like LastPrunedBlockHeightKey

Concerns:

  • Silent failure mode (block/internal/submitting/submitter.go:395-403): Pruning failures are logged but don't halt the node. While this is intentional ("best-effort pruning" per PR description), it could lead to silent disk space exhaustion if pruning consistently fails. Consider adding metrics or periodic alerts when pruning fails repeatedly.
  • Tight coupling in pruning trigger: Pruning is tightly coupled to the DA inclusion loop. This means pruning only runs on nodes that process DA inclusion. Consider documenting this clearly or extracting pruning to a separate goroutine for better modularity.

Recommendations:

// In submitter.go, add metrics for pruning failures:
if err := s.store.PruneBlocks(s.ctx, targetHeight); err != nil {
    s.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
    s.metrics.PruningFailures.Inc() // Add this metric
}

2. Error Handling & Resilience ⭐⭐⭐☆☆ (3/5)

Strengths:

  • Proper error wrapping: Uses fmt.Errorf with %w for error context throughout
  • Graceful handling of missing data: PruneBlocks() treats ErrNotFound as non-fatal (lines 336-337, 342-345, etc.)
  • Transaction safety: Uses batching to ensure atomic deletions

Critical Concerns:

  1. Missing validation in pruning trigger (submitter.go:369-404):

    • Pruning runs every ticker interval without checking if the interval has actually elapsed
    • Line 382 checks if storeHeight <= lastPruned+s.config.Node.PruningInterval, but this should check blocks not time
    • This is actually correct - it checks blocks. However, the variable naming could be clearer.
  2. No protection against concurrent pruning:

    • Multiple goroutines could theoretically call PruneBlocks() simultaneously
    • While the implementation is idempotent, concurrent batch operations could conflict

    Recommendation:

    // Add mutex to Submitter for pruning
    type Submitter struct {
        // ... existing fields
        pruningMu sync.Mutex
    }
    
    // In processDAInclusionLoop, protect pruning:
    s.pruningMu.Lock()
    if err := s.store.PruneBlocks(s.ctx, targetHeight); err != nil {
        s.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
    }
    s.pruningMu.Unlock()
  3. Cache invalidation after pruning (cached_store.go:163-171):

    • InvalidateRange(1, height) invalidates from height 1, which could be expensive
    • Since pruning is incremental (tracked by LastPrunedBlockHeightKey), only need to invalidate newly pruned range

    Recommendation:

    func (cs *CachedStore) PruneBlocks(ctx context.Context, height uint64) error {
        // Get last pruned height before pruning
        var lastPruned uint64
        if bz, err := cs.GetMetadata(ctx, LastPrunedBlockHeightKey); err == nil && len(bz) == 8 {
            lastPruned = binary.LittleEndian.Uint64(bz)
        }
        
        if err := cs.Store.PruneBlocks(ctx, height); err != nil {
            return err
        }
        
        // Only invalidate newly pruned range
        cs.InvalidateRange(lastPruned+1, height)
        return nil
    }

3. Security ⭐⭐⭐⭐⭐ (5/5)

Strengths:

  • No exposure of sensitive data in logs or errors
  • Proper bounds checking: Configuration validation prevents PruningKeepRecent = 0 when enabled
  • DA inclusion safety: Only prunes blocks that are DA-included (upperBound := min(storeHeight, currentDAIncluded))
  • State preservation: Correctly avoids pruning state snapshots (/s/{height}), current height (/t), and global metadata

No security concerns identified. The implementation is secure.


4. Performance & Resource Efficiency ⭐⭐⭐⭐☆ (4/5)

Strengths:

  • Efficient batch operations: Uses batch.Delete() to minimize I/O overhead
  • Incremental pruning: Tracks LastPrunedBlockHeightKey to skip already-pruned ranges
  • Optimized tail lookups (store_adapter.go:269-279): Uses LastPrunedBlockHeightKey to skip pruned ranges instead of linear scanning from genesis - excellent optimization!
  • Minimal locking: The heightSub mechanism uses atomic operations for fast-path height checks

Concerns:

  1. Linear iteration in PruneBlocks (store.go:331-378):

    • For large intervals, the loop for h := lastPruned + 1; h <= height; h++ could process thousands of heights
    • Each iteration performs multiple Delete() calls (5-7 per height)
    • For PruningInterval=10000, this could be ~50,000 delete operations

    Impact: Moderate. Batch operations amortize the cost, but this could still block the DA inclusion loop for several seconds.

    Recommendation: Consider adding progress logging or breaking pruning into smaller chunks:

    const maxPrunePerBatch = 1000 // Prune max 1000 blocks per call
    if height - lastPruned > maxPrunePerBatch {
        height = lastPruned + maxPrunePerBatch
    }
  2. Memory allocation in pruning loop (store.go:331-378):

    • Each iteration allocates: headerBlob (variable size), headerHash (32 bytes), multiple error objects
    • For 10,000 heights, this could be significant GC pressure

    Recommendation: Pre-allocate buffers outside the loop where possible.


5. Testing & Reliability ⭐⭐☆☆☆ (2/5)

Critical Issue: Only 23.36% patch coverage with 82 lines missing coverage (per Codecov report).

Test Coverage Analysis:

Good coverage:

  • execution/evm/store_test.go: Comprehensive tests for ExecMeta pruning
    • TestPruneExec_PrunesUpToTargetHeight: Verifies deletion of target range
    • TestPruneExec_TracksLastPrunedHeight: Verifies incremental pruning

⚠️ Missing coverage:

  • pkg/store/store.go:299-390 (PruneBlocks): 0% coverage of the main pruning logic!

    • No tests for header/data/signature deletion
    • No tests for DA metadata deletion
    • No tests for hash index deletion
    • No tests for error cases (batch creation failure, delete failures, commit failure)
  • block/internal/submitting/submitter.go:369-404: 0% coverage of pruning trigger logic

    • No integration tests verifying pruning runs periodically
    • No tests for boundary conditions (not enough blocks to prune)
    • No tests for interaction with DA inclusion
  • pkg/store/cached_store.go:163-171: 0% coverage of cache invalidation

Required Tests:

// Add to pkg/store/store_test.go:

func TestPruneBlocks_DeletesAllBlockComponents(t *testing.T) {
    // Test that pruning deletes headers, data, signatures, hash index, and DA metadata
}

func TestPruneBlocks_Idempotent(t *testing.T) {
    // Test that calling PruneBlocks(10) twice doesn't cause errors
}

func TestPruneBlocks_TracksProgress(t *testing.T) {
    // Test LastPrunedBlockHeightKey is updated correctly
}

func TestPruneBlocks_ErrorHandling(t *testing.T) {
    // Test batch creation failure, delete failure, commit failure
}

func TestPruneBlocks_PreservesStateAndMetadata(t *testing.T) {
    // Test that /s/{height}, /t, and global metadata are NOT deleted
}

func TestCachedStore_PruneInvalidatesCache(t *testing.T) {
    // Test that cache is invalidated after pruning
}

Recommendation: Add comprehensive unit tests before merging. Target: >80% coverage for pruning code paths.


6. Documentation & Developer Experience ⭐⭐⭐⭐☆ (4/5)

Strengths:

  • Excellent PR description: Clear overview, config explanation, design trade-offs
  • Good code comments: PruneBlocks() has clear docstring explaining behavior
  • Config documentation (config.go:265-270): Clear field comments with examples
  • Interface documentation (types.go:88-97): Pruner interface is well-documented

Minor gaps:

  1. No user-facing documentation: No updates to README.md or docs/ explaining the new feature

    • Users don't know how to enable pruning or what values to use
  2. Missing operational guidance:

    • What happens if pruning is enabled on an archive node with existing data?
    • How much disk space savings can users expect?
    • Are there any performance impacts during pruning?

Recommendation: Add a section to CLAUDE.md or README.md:

## Block Pruning

Ev-node supports height-based pruning to reduce disk usage while maintaining recent block history.

### Configuration

- `--evnode.node.pruning_enabled`: Enable pruning (default: false)
- `--evnode.node.pruning_keep_recent`: Number of recent blocks to keep (must be > 0)
- `--evnode.node.pruning_interval`: Run pruning every N blocks (must be >= 1)

### Example

Keep last 1000 blocks, prune every 100 blocks:
```bash
--evnode.node.pruning_enabled=true \
--evnode.node.pruning_keep_recent=1000 \
--evnode.node.pruning_interval=100

What Gets Pruned

  • Block headers, data, signatures
  • Hash indexes
  • DA metadata mappings
  • Execution metadata (if supported by execution layer)

What Is NOT Pruned

  • State snapshots
  • Current height marker
  • Global metadata (DA included height, etc.)

---

## Configuration Validation ⭐⭐⭐⭐☆ (4/5)

**Strengths:**
- `config.go:388-399`: Good validation preventing contradictory configs
- Clear error messages

**Minor issue**: Comment on line 269 says "Must be > 0 when pruning is enabled; set pruning_enabled=false to keep all blocks (archive mode)." but the **validation allows `PruningKeepRecent=0` when pruning is disabled**, which is correct but the comment phrasing is confusing.

**Recommendation**: Clarify comment:
```go
PruningKeepRecent uint64 `mapstructure:"pruning_keep_recent" yaml:"pruning_keep_recent" comment:"Number of most recent blocks to retain when pruning is enabled. Must be > 0 if pruning_enabled=true. Ignored if pruning_enabled=false (archive mode keeps all blocks)."`

Concurrency & Race Conditions ⭐⭐⭐⭐☆ (4/5)

Strengths:

  • heightSub uses atomic operations correctly (store_adapter.go)
  • Batch operations are atomic
  • No obvious data races in the pruning logic itself

Concern: See "Error Handling" section - missing mutex for concurrent pruning protection.


Specific Code Comments

store.go:331-378 (PruneBlocks loop)

for h := lastPruned + 1; h <= height; h++ {
    // Get header blob to compute the hash index key. If header is already
    // missing (e.g. due to previous partial pruning), just skip this height.
    headerBlob, err := s.db.Get(ctx, ds.NewKey(getHeaderKey(h)))
    if err != nil {
        if errors.Is(err, ds.ErrNotFound) {
            continue // ✅ Good: gracefully handles missing data
        }
        return fmt.Errorf("failed to get header at height %d during pruning: %w", h, err)
    }
    // ... deletion logic
}

Issue: The loop continues on ErrNotFound, which is correct for idempotency. However, if there's a transient I/O error (not ErrNotFound), the function returns early without updating LastPrunedBlockHeightKey. This means:

  • The same range will be retried on next pruning attempt (good for correctness)
  • But errors are not persisted/counted, so repeated transient failures could delay pruning indefinitely

Recommendation: Add retry logic or at least log/metric transient errors separately from permanent errors.


submitter.go:369-404 (Pruning trigger)

if s.config.Node.PruningEnabled && s.config.Node.PruningKeepRecent > 0 && s.config.Node.PruningInterval > 0 {

Good: Triple-check before pruning.

if storeHeight <= lastPruned+s.config.Node.PruningInterval {
    continue
}

Issue: This check means pruning won't run until storeHeight > lastPruned + interval. For example:

  • lastPruned = 100, interval = 50
  • Pruning won't run again until storeHeight > 150

But this doesn't account for the current DA included height. If currentDAIncluded = 120 and keepRecent = 10, we could prune up to height 110, but this check prevents it until storeHeight > 150.

Recommendation: Change condition to:

if currentDAIncluded <= lastPruned+s.config.Node.PruningInterval {
    continue
}

This ensures pruning runs based on DA-included progress, not just store height.


cached_store.go:163-171 (Cache invalidation)

Already covered in "Error Handling" section - recommend optimizing to invalidate only newly pruned range.


store_adapter.go:269-279 (Tail optimization)

if meta, err := a.store.GetMetadata(ctx, LastPrunedBlockHeightKey); err == nil && len(meta) == heightLength {
    if lastPruned, err := decodeHeight(meta); err == nil {
        if candidate := lastPruned + 1; candidate > startHeight {
            startHeight = candidate
        }
    }
}

Excellent: This optimization prevents scanning from genesis height on pruned nodes. Well done!


Priority Recommendations

High Priority (Should Address Before Merge)

  1. Add unit tests for PruneBlocks() - Currently 0% coverage of main pruning logic
  2. Add integration test verifying pruning runs periodically in the DA inclusion loop
  3. Fix cache invalidation to only invalidate newly pruned range
  4. Add pruning metrics for monitoring failures

Medium Priority (Can Address Post-Merge)

  1. Add mutex to prevent concurrent pruning
  2. Optimize pruning trigger to check currentDAIncluded instead of storeHeight
  3. Add user-facing documentation for pruning configuration
  4. Consider chunking large pruning operations to avoid blocking

Low Priority (Nice to Have)

  1. Add retry logic for transient I/O errors during pruning
  2. Add progress logging for large pruning operations
  3. Pre-allocate buffers in pruning loop to reduce GC pressure

Conclusion

This is a solid, well-architected feature with good separation of concerns and proper DA-safety guarantees. The main concern is insufficient test coverage (23%), particularly for the core PruneBlocks() logic. I strongly recommend adding comprehensive unit tests before merging.

The "best-effort pruning" design (log-and-continue on errors) is a reasonable trade-off, but operators should be aware that pruning failures could lead to unbounded disk growth. Consider adding metrics/alerts for production deployments.

Approval Recommendation: Approve pending addition of tests. The code is production-ready once test coverage reaches >80% for pruning paths.


@pthmas pthmas changed the title Block Prunning feat: block Prunning Jan 26, 2026
@pthmas pthmas changed the title feat: block Prunning feat: block Pruning Jan 26, 2026
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01
@pthmas pthmas force-pushed the pierrick/prunning branch 7 times, most recently from 26628f5 to 4d65b1e Compare January 28, 2026 15:41
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 21.15385% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.08%. Comparing base (d20b1ac) to head (86d8a2d).

Files with missing lines Patch % Lines
pkg/store/store.go 20.00% 24 Missing and 12 partials ⚠️
block/internal/submitting/submitter.go 0.00% 19 Missing and 1 partial ⚠️
pkg/store/tracing.go 0.00% 11 Missing ⚠️
pkg/config/config.go 37.50% 4 Missing and 1 partial ⚠️
pkg/store/cached_store.go 0.00% 5 Missing ⚠️
pkg/store/store_adapter.go 58.33% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2984      +/-   ##
==========================================
- Coverage   56.38%   56.08%   -0.31%     
==========================================
  Files         118      118              
  Lines       12036    12134      +98     
==========================================
+ Hits         6787     6805      +18     
- Misses       4507     4569      +62     
- Partials      742      760      +18     
Flag Coverage Δ
combined 56.08% <21.15%> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pthmas pthmas force-pushed the pierrick/prunning branch 4 times, most recently from 7aef549 to 4d31269 Compare February 2, 2026 17:35
@pthmas pthmas marked this pull request as ready for review February 2, 2026 18:38
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e07144 to 26d0327 Compare February 3, 2026 19:07
@pthmas pthmas force-pushed the pierrick/prunning branch from 0c346a0 to 94f37d2 Compare February 4, 2026 12:26
@pthmas pthmas force-pushed the pierrick/prunning branch from 94f37d2 to ddc064a Compare February 4, 2026 12:31
pthmas and others added 4 commits February 4, 2026 13:40
@pthmas pthmas force-pushed the pierrick/prunning branch from aef0de8 to 20d578b Compare February 4, 2026 14:32
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! lgtm, left some nits.
Maybe @chatton knows how to properly test this on CI?

-> advance in heights, prune, verify querying those blocks returns block not found, but we are still advancing heights

@pthmas
Copy link
Contributor Author

pthmas commented Feb 4, 2026

nice work! lgtm, left some nits. Maybe @chatton knows how to properly test this on CI?

-> advance in heights, prune, verify querying those blocks returns block not found, but we are still advancing heights

I ran the chaos network with different pruning settings (more or less aggressive) and everything went well. Prunning was working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants