Vectorize continuous color computation in pl.nodes() for large speedup#48
Vectorize continuous color computation in pl.nodes() for large speedup#48
Conversation
The previous implementation applied the colormap per-element in a Python list comprehension (O(n) interpreter overhead), then passed a list of RGBA tuples as `color=` to `ax.scatter()`. For a tree with ~2000 nodes this loop alone took ~70 ms. Replace with a vectorized approach using `pd.Series.reindex` to align values with the plotting order, `np.ma.masked_invalid` to handle missing nodes, and a single bulk colormap call. This is semantically identical: missing nodes still receive `na_color`, present nodes receive the same RGBA values, and the colorbar legend is unchanged. Benchmark on a balanced binary tree with 2047 nodes (1024 leaves): color computation: 70.5 ms → 0.22 ms (317× faster) color computation (internal nodes only, n=1023): 34.5 ms → 0.64 ms (54× faster) The fix applies via `_get_colors`, which is shared with `pl.branches`, so branch coloring benefits as well. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b13c4e1ff7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Vectorized: reindex to align with indicies (NaN for missing), then apply colormap in bulk | ||
| values = data.reindex(indicies) | ||
| color_map.set_bad(na_color) | ||
| colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float)))) |
There was a problem hiding this comment.
Avoid treating ±inf as missing in numeric color mapping
Using np.ma.masked_invalid(...) here masks both NaN and ±inf, so any infinite numeric values are now rendered with na_color instead of being mapped through the continuous colormap. This changes result semantics for workflows where overflow/div-by-zero values appear in node/edge metrics (they become visually indistinguishable from truly missing values), whereas the previous per-element path only treated absent indices as missing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b13c4e1ff7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Vectorized: reindex to align with indicies (NaN for missing), then apply colormap in bulk | ||
| values = data.reindex(indicies) | ||
| color_map.set_bad(na_color) | ||
| colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float)))) |
There was a problem hiding this comment.
Preserve infinite numeric values during color vectorization
Using np.ma.masked_invalid(...) here treats +/-inf as missing values, not just absent/NaN entries. In the previous path, infinite numeric values were still passed through Normalize and mapped via the colormap (typically to over/under extremes), but this change forces them to na_color, which can silently miscolor real overflow/outlier data in pl.nodes()/pl.branches() when annotations contain infinities.
Useful? React with 👍 / 👎.
_get_colors now returns an N×4 numpy array for continuous data instead of a list of per-element tuples/strings. Update the test assertions accordingly: - isinstance check: list → np.ndarray - na_color check: string equality → np.testing.assert_allclose against mcolors.to_rgba Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
=======================================
Coverage 92.69% 92.69%
=======================================
Files 34 34
Lines 2450 2452 +2
=======================================
+ Hits 2271 2273 +2
Misses 179 179
🚀 New features to boost your workflow:
|
Summary
_get_colors(used by bothpl.nodes()andpl.branches()) with a fully vectorized NumPy approach for continuous (numeric) color data.pd.Series.reindexto align values with the plot order in one call,np.ma.masked_invalidto handle missing nodes, and a single bulk colormap evaluation instead of N individual calls.na_color, all RGBA output values match the old implementation exactly (verified in the benchmark script, max diff = 0.0).Root cause
_get_colorscomputed continuous colors with a Python list comprehension:Each iteration called through the Python interpreter, did a pandas index lookup, applied the normalizer, and applied the colormap — all per node. For ~2000 nodes this loop alone took ~70 ms.
The fix collapses this to three vectorized calls:
Performance
Benchmark on a balanced binary tree with 2,047 nodes (1,024 leaves, 1,023 internal nodes), measured over 100 iterations:
The benchmark script is at
scripts/benchmark_nodes_color.py.Test plan
test_plot_tree.pyandtest_plot_utils.pytests pass (26 passed)0.0🤖 Generated with Claude Code