Skip to content

feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886

Open
pdrobnjak wants to merge 6 commits intopd/benchmark-comparefrom
pd/benchmark-profiling-improvements
Open

feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886
pdrobnjak wants to merge 6 commits intopd/benchmark-comparefrom
pd/benchmark-profiling-improvements

Conversation

@pdrobnjak
Copy link
Contributor

@pdrobnjak pdrobnjak commented Feb 13, 2026

Summary

  • Add fgprof (wall-clock profiler) to capture off-CPU time (I/O, blocking, GC pauses) invisible to Go's standard CPU profiler. Registered on DefaultServeMux behind benchmark build tag only — no production impact.
  • Enable block and mutex contention profiling via runtime.SetBlockProfileRate and runtime.SetMutexProfileFraction, also gated behind benchmark build tag. Uses conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize TPS measurement overhead.
  • Update benchmark-compare.sh to auto-capture all 6 profile types (CPU, fgprof, heap, goroutine, block, mutex) midway through runs instead of just CPU + heap.
  • Add DURATION env var to benchmark.sh (default 120s, 0 = run forever). When set, runs seid in the background, auto-captures all 6 profiles midway, extracts TPS stats (median/avg/min/max), and exits cleanly — enabling fully automated single-scenario profiling.
  • Fix CPU/fgprof profile corruption: Switch from parallel background captures to sequential execution. Go's CPU profiler (SIGPROF) and fgprof (runtime.GoroutineProfile) conflict when running concurrently on the same process, producing empty or corrupted profiles. Also measure actual capture duration for accurate remaining-time calculation, and make BASE_DIR overridable so benchmark-compare.sh routes profiles to per-label directories correctly.
  • Expand benchmark/CLAUDE.md with profile type reference table, CPU-vs-fgprof guidance, heap metric selection guide, interactive flamegraph docs, and a full optimization loop workflow (profile → analyze → discuss → implement → compare → validate → PR).

Test plan

  • Verify go build ./sei-tendermint/node/ succeeds (no fgprof in non-benchmark build)
  • Verify go build -tags benchmark ./sei-tendermint/node/ succeeds (fgprof registered)
  • Verify go build -tags benchmark ./app/ succeeds (block/mutex profiling enabled)
  • Run DURATION=90 benchmark/benchmark.sh and confirm auto-stop, profile capture, and TPS extraction
  • Verify all 6 profile files in /tmp/sei-bench/pprof/ with non-zero sizes
  • Verify go tool pprof -top works on each captured profile type
  • Verify sequential capture produces valid CPU and fgprof profiles (previously corrupted when captured in parallel)

🤖 Generated with Claude Code

…e docs

Add wall-clock profiling (fgprof) alongside standard CPU profiling to
capture off-CPU time (I/O, blocking, GC pauses). Register the fgprof
handler behind the benchmark build tag so production binaries are
unaffected.

Enable block and mutex contention profiling via runtime calls, also
gated behind the benchmark build tag. Use conservative sampling rates
(1us block threshold, 1/5 mutex fraction) to minimize overhead on TPS.

Update benchmark-compare.sh to capture all 6 profile types (CPU,
fgprof, heap, goroutine, block, mutex) and report sizes for each.

Expand benchmark/CLAUDE.md with:
- Profile type reference table with when-to-use guidance
- CPU vs fgprof explanation
- Heap metric selection guide (inuse_space vs alloc_objects etc)
- Interactive flamegraph and drill-down commands
- Single-scenario manual capture examples
- Source-mapping tip for pprof

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 13, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 13, 2026, 5:03 PM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.19%. Comparing base (127ed99) to head (6dd8b0b).

Additional details and impacted files

Impacted file tree graph

@@                  Coverage Diff                  @@
##           pd/benchmark-compare    #2886   +/-   ##
=====================================================
  Coverage                 57.18%   57.19%           
=====================================================
  Files                      2091     2091           
  Lines                    171179   171173    -6     
=====================================================
+ Hits                      97891    97897    +6     
+ Misses                    64578    64557   -21     
- Partials                   8710     8719    +9     
Flag Coverage Δ
sei-chain 52.65% <ø> (+<0.01%) ⬆️
sei-cosmos 48.15% <ø> (+0.01%) ⬆️
sei-db 68.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…extraction

benchmark.sh now runs for DURATION seconds (default 120), auto-captures
all 6 profile types midway, extracts TPS stats, and exits cleanly.
DURATION=0 preserves the original run-forever behavior. Also documents
the full optimization loop workflow in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pdrobnjak pdrobnjak self-assigned this Feb 13, 2026
pdrobnjak and others added 2 commits February 13, 2026 14:50
…on loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion loop

Adds a Claude Code command that runs a structured optimization workflow:
profile -> analyze -> discuss -> implement -> compare -> validate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… corruption

Go's CPU profiler and fgprof conflict when running concurrently on the
same process, producing empty or corrupted profiles. Switch from parallel
background captures to sequential execution (CPU first, then fgprof),
measure actual capture duration for accurate remaining-time calculation,
and make BASE_DIR overridable so benchmark-compare.sh can route profiles
to per-label directories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant