Skip to content

[DNM] feat: delete go-header store cmd#3040

Open
julienrbrt wants to merge 25 commits intomainfrom
julien/go-header-clean
Open

[DNM] feat: delete go-header store cmd#3040
julienrbrt wants to merge 25 commits intomainfrom
julien/go-header-clean

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Feb 2, 2026

Overview

evm cleanup-goheader

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 4, 2026, 11:31 AM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new command-line utility designed to clean up obsolete data from the application's database. The data, previously duplicated by the go-header library for P2P synchronization, is now redundant due to a recent architectural shift towards a unified data storage mechanism. The new command provides a safe and controlled way to reclaim disk space by removing these legacy entries, including a dry-run option for verification before irreversible changes are made.

Highlights

  • New Cleanup Command: A new command, cleanup-goheader, has been introduced to remove legacy go-header store data from the database.
  • Data Deletion Target: This command specifically targets and deletes data stored under the /headerSync and /dataSync prefixes, which are no longer needed after a migration to a unified store approach.
  • Dry Run Functionality: The command includes a --dry-run flag, allowing users to preview which keys would be deleted without actually performing the deletion, enhancing safety.
  • Integration: The new cleanup-goheader command is integrated into the main application's command-line interface.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Feb 2, 2026

Claude finished @julienrbrt's task in 1m 44s —— View job


Comprehensive Code Review

I've completed a thorough review of the cleanup-goheader command implementation. Below are my findings across all focus areas:


Critical Issues

1. Undefined Variable: evmDbName (apps/evm/cmd/cleanup_goheader.go:51)

The variable evmDbName is not defined in cleanup_goheader.go, causing a compilation error.

Issue: evmDbName is defined as a constant in run.go:36 but is not accessible from cleanup_goheader.go.

Solution: Move evmDbName to a shared location or define it in cleanup_goheader.go. Based on the pattern in other files (e.g., rollback.go:40), this should be defined as a package-level constant in a shared file or duplicated in this file.

const (
    // go-header store prefixes used prior to the unified store migration
    headerSyncPrefix = "/headerSync"
    dataSyncPrefix   = "/dataSync"
    evmDbName        = "evm-single"  // Add this
)

Fix this →


High Priority Issues

2. Memory Safety: Unbounded Batch Size (apps/evm/cmd/cleanup_goheader.go:98-134)

The deletePrefix function loads all delete operations into a single batch regardless of size. This could cause high memory consumption and potential OOM crashes if there are hundreds of thousands or millions of keys to delete.

Issue: The current implementation collects all keys in a single batch before committing:

for result := range results.Next() {
    // ... adds all keys to batch without limit
    batch.Delete(ctx, ds.NewKey(result.Key))
    count++
}
// Single commit at the end
batch.Commit(ctx)

Recommended Solution: Implement chunked batching (commit every N keys). This pattern is already used in tools/db-bench/main.go:162-173:

const batchChunkSize = 1000 // or configurable

for result := range results.Next() {
    if result.Error != nil {
        return count, fmt.Errorf("error iterating results: %w", result.Error)
    }

    if !dryRun {
        if err := batch.Delete(ctx, ds.NewKey(result.Key)); err != nil {
            return count, fmt.Errorf("failed to delete key %s: %w", result.Key, err)
        }
    }
    count++
    
    // Commit in chunks
    if !dryRun && count%batchChunkSize == 0 {
        if err := batch.Commit(ctx); err != nil {
            return count, fmt.Errorf("failed to commit batch delete: %w", err)
        }
        // Create new batch for next chunk
        batch, err = db.Batch(ctx)
        if err != nil {
            return count, fmt.Errorf("failed to create batch: %w", err)
        }
    }
}

// Commit remaining items
if !dryRun && count%batchChunkSize != 0 {
    if err := batch.Commit(ctx); err != nil {
        return count, fmt.Errorf("failed to commit final batch: %w", err)
    }
}

Fix this →


Medium Priority Issues

3. Missing Tests

There are no unit tests for the cleanup command. Given the destructive nature of this operation, comprehensive testing is critical.

Recommendations:

  • Add unit tests covering:
    • Dry-run mode (verifies no deletions occur)
    • Successful deletion of both prefixes
    • Empty database (no keys found)
    • Database errors during query/delete/commit
    • Context cancellation
    • Large datasets (validate chunked batching works)

4. No Progress Feedback for Long Operations

For large datasets, the command provides no progress indication. Users won't know if the operation is stuck or progressing.

Recommendation: Add periodic progress logging:

if count > 0 && count%10000 == 0 {
    cmd.Printf("Progress: deleted %d keys so far...\n", count)
}

5. No Atomic Rollback on Partial Failure

If deletion fails midway (e.g., after deleting headerSync but before dataSync), the database is left in an inconsistent state with no recovery mechanism.

Recommendation: Consider documenting this limitation in the Long help text, or implement a recovery mechanism (e.g., keeping a transaction log).


Code Quality & Style

6. Good Practices Observed

  • ✅ Clear documentation in command Long description
  • ✅ Proper error wrapping with context (fmt.Errorf with %w)
  • ✅ Deferred database cleanup with error checking
  • ✅ Context propagation throughout
  • ✅ Dry-run flag for safety
  • ✅ Clear warning about stopping the node first

7. Minor Improvements

a. Results.Close() Error Handling (line 106)
The results.Close() is deferred but doesn't check for errors. While this is common, it could mask issues:

defer func() {
    if err := results.Close(); err != nil {
        // Log or handle close error
    }
}()

b. Batch Cleanup on Error
If an error occurs during iteration, the batch is not explicitly cleaned up (though Go's GC will handle it). For defensive programming:

defer func() {
    if count == 0 && batch != nil {
        // Cancel/cleanup batch if nothing was done
    }
}()

Security Considerations

8. Security Assessment

  • ✅ No injection vulnerabilities (prefixes are hardcoded constants)
  • ✅ No credential exposure risks
  • ✅ Appropriate warning about irreversibility
  • ⚠️ Missing: File system permission check (command could fail with unclear error if DB is locked by running node)

Recommendation: Add a check to verify the node is not running:

// Check if database is locked (node might be running)
// This could be done by attempting to acquire an exclusive lock

Performance Considerations

9. Query Optimization

  • ✅ Uses KeysOnly: true to minimize data transfer - excellent optimization
  • ✅ Uses batching for deletions (though needs chunking as noted above)
  • ❌ No limit on query results, which ties into the memory concern

Documentation & Developer Experience

10. Documentation Quality

  • ✅ Clear command description and warnings
  • ✅ Good inline comments explaining the purpose
  • ✅ Helpful output messages distinguishing dry-run vs actual deletion
  • ⚠️ Missing:
    • No example usage in comments or help text
    • No mention of expected runtime for large datasets
    • No CLAUDE.md update to document this new command

Recommendation: Add example to help text:

Long: `Delete the legacy go-header store data...

Example:
  # Preview what would be deleted
  evm cleanup-goheader --dry-run
  
  # Actually delete the data
  evm cleanup-goheader`,

Summary

Must Fix Before Merge:

  1. ❌ Add evmDbName constant (compilation error)
  2. ⚠️ Implement chunked batching for memory safety

Strongly Recommended:
3. Add comprehensive unit tests
4. Add progress feedback for large datasets

Nice to Have:
5. Improve error handling (results.Close, batch cleanup)
6. Add usage examples to help text
7. Add node-running detection

The implementation follows good Go practices overall, but the two critical issues must be addressed before this can be merged safely.


Tasks:

  • Read and analyze changed files
  • Review code quality and architecture
  • Check error handling and resilience
  • Verify security considerations
  • Assess performance and resource efficiency
  • Review testing coverage
  • Check documentation completeness
  • Compile findings and post review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cleanup-goheader command to remove legacy data from the store. The implementation is straightforward, but I've found a critical issue that will prevent compilation due to an undefined variable. I've also pointed out a potential memory issue when deleting a large number of keys and suggested an improvement for robustness. Once these are addressed, the command should work as intended.

}

// Open the database
rawDB, err := store.NewDefaultKVStore(nodeConfig.RootDir, nodeConfig.DBPath, evmDbName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable evmDbName is not defined in this file or package, which will cause a compilation error. It should be defined, likely as a constant, representing the name of the EVM database.

Comment on lines +98 to +134
func deletePrefix(ctx context.Context, db ds.Batching, prefix string, dryRun bool) (int, error) {
results, err := db.Query(ctx, dsq.Query{
Prefix: prefix,
KeysOnly: true,
})
if err != nil {
return 0, fmt.Errorf("failed to query keys with prefix %s: %w", prefix, err)
}
defer results.Close()

count := 0
batch, err := db.Batch(ctx)
if err != nil {
return 0, fmt.Errorf("failed to create batch: %w", err)
}

for result := range results.Next() {
if result.Error != nil {
return count, fmt.Errorf("error iterating results: %w", result.Error)
}

if !dryRun {
if err := batch.Delete(ctx, ds.NewKey(result.Key)); err != nil {
return count, fmt.Errorf("failed to delete key %s: %w", result.Key, err)
}
}
count++
}

if !dryRun && count > 0 {
if err := batch.Commit(ctx); err != nil {
return count, fmt.Errorf("failed to commit batch delete: %w", err)
}
}

return count, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function loads all delete operations for a given prefix into a single batch. If there are a very large number of keys to delete, this could lead to high memory consumption. Consider processing the deletions in smaller batches (e.g., committing every 1000 keys) to limit memory usage and make the operation more robust against large datasets.

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.35%. Comparing base (d20b1ac) to head (f74fcda).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3040      +/-   ##
==========================================
- Coverage   56.38%   56.35%   -0.04%     
==========================================
  Files         118      118              
  Lines       12036    12036              
==========================================
- Hits         6787     6783       -4     
- Misses       4507     4511       +4     
  Partials      742      742              
Flag Coverage Δ
combined 56.35% <ø> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from julien/simplify-store to main February 2, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant