feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886
Open
pdrobnjak wants to merge 6 commits intopd/benchmark-comparefrom
Open
feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886pdrobnjak wants to merge 6 commits intopd/benchmark-comparefrom
pdrobnjak wants to merge 6 commits intopd/benchmark-comparefrom
Conversation
…e docs Add wall-clock profiling (fgprof) alongside standard CPU profiling to capture off-CPU time (I/O, blocking, GC pauses). Register the fgprof handler behind the benchmark build tag so production binaries are unaffected. Enable block and mutex contention profiling via runtime calls, also gated behind the benchmark build tag. Use conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize overhead on TPS. Update benchmark-compare.sh to capture all 6 profile types (CPU, fgprof, heap, goroutine, block, mutex) and report sizes for each. Expand benchmark/CLAUDE.md with: - Profile type reference table with when-to-use guidance - CPU vs fgprof explanation - Heap metric selection guide (inuse_space vs alloc_objects etc) - Interactive flamegraph and drill-down commands - Single-scenario manual capture examples - Source-mapping tip for pprof Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## pd/benchmark-compare #2886 +/- ##
=====================================================
Coverage 57.18% 57.19%
=====================================================
Files 2091 2091
Lines 171179 171173 -6
=====================================================
+ Hits 97891 97897 +6
+ Misses 64578 64557 -21
- Partials 8710 8719 +9
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…extraction benchmark.sh now runs for DURATION seconds (default 120), auto-captures all 6 profile types midway, extracts TPS stats, and exits cleanly. DURATION=0 preserves the original run-forever behavior. Also documents the full optimization loop workflow in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion loop Adds a Claude Code command that runs a structured optimization workflow: profile -> analyze -> discuss -> implement -> compare -> validate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… corruption Go's CPU profiler and fgprof conflict when running concurrently on the same process, producing empty or corrupted profiles. Switch from parallel background captures to sequential execution (CPU first, then fgprof), measure actual capture duration for accurate remaining-time calculation, and make BASE_DIR overridable so benchmark-compare.sh can route profiles to per-label directories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
benchmarkbuild tag only — no production impact.runtime.SetBlockProfileRateandruntime.SetMutexProfileFraction, also gated behindbenchmarkbuild tag. Uses conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize TPS measurement overhead.benchmark.sh(default 120s, 0 = run forever). When set, runs seid in the background, auto-captures all 6 profiles midway, extracts TPS stats (median/avg/min/max), and exits cleanly — enabling fully automated single-scenario profiling.SIGPROF) and fgprof (runtime.GoroutineProfile) conflict when running concurrently on the same process, producing empty or corrupted profiles. Also measure actual capture duration for accurate remaining-time calculation, and makeBASE_DIRoverridable sobenchmark-compare.shroutes profiles to per-label directories correctly.Test plan
go build ./sei-tendermint/node/succeeds (no fgprof in non-benchmark build)go build -tags benchmark ./sei-tendermint/node/succeeds (fgprof registered)go build -tags benchmark ./app/succeeds (block/mutex profiling enabled)DURATION=90 benchmark/benchmark.shand confirm auto-stop, profile capture, and TPS extraction/tmp/sei-bench/pprof/with non-zero sizesgo tool pprof -topworks on each captured profile type🤖 Generated with Claude Code