Vectorize continuous color computation in pl.nodes() for large speedup by colganwi · Pull Request #48 · YosefLab/pycea

colganwi · 2026-02-25T19:40:30Z

Summary

Replaces the per-element Python list comprehension in _get_colors (used by both pl.nodes() and pl.branches()) with a fully vectorized NumPy approach for continuous (numeric) color data.
Uses pd.Series.reindex to align values with the plot order in one call, np.ma.masked_invalid to handle missing nodes, and a single bulk colormap evaluation instead of N individual calls.
Behavior is identical: missing nodes still receive na_color, all RGBA output values match the old implementation exactly (verified in the benchmark script, max diff = 0.0).

Root cause

_get_colors computed continuous colors with a Python list comprehension:

# Old – O(n) Python interpreter overhead
colors = [color_map(norm(data[i])) if i in data.index else na_color for i in indicies]

Each iteration called through the Python interpreter, did a pandas index lookup, applied the normalizer, and applied the colormap — all per node. For ~2000 nodes this loop alone took ~70 ms.

The fix collapses this to three vectorized calls:

# New – bulk NumPy operations
values = data.reindex(indicies)          # align in one shot; NaN for missing
color_map.set_bad(na_color)              # na_color for masked entries
colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float))))

Performance

Benchmark on a balanced binary tree with 2,047 nodes (1,024 leaves, 1,023 internal nodes), measured over 100 iterations:

Scenario	Before	After	Speedup
Color computation – all nodes (n=2,047)	70.5 ms	0.22 ms	317×
Color computation – internal nodes only (n=1,023)	34.5 ms	0.64 ms	54×

The benchmark script is at scripts/benchmark_nodes_color.py.

Test plan

All existing test_plot_tree.py and test_plot_utils.py tests pass (26 passed)
Correctness verified in benchmark: max RGBA difference between old and new output = 0.0

🤖 Generated with Claude Code

The previous implementation applied the colormap per-element in a Python list comprehension (O(n) interpreter overhead), then passed a list of RGBA tuples as `color=` to `ax.scatter()`. For a tree with ~2000 nodes this loop alone took ~70 ms. Replace with a vectorized approach using `pd.Series.reindex` to align values with the plotting order, `np.ma.masked_invalid` to handle missing nodes, and a single bulk colormap call. This is semantically identical: missing nodes still receive `na_color`, present nodes receive the same RGBA values, and the colorbar legend is unchanged. Benchmark on a balanced binary tree with 2047 nodes (1024 leaves): color computation: 70.5 ms → 0.22 ms (317× faster) color computation (internal nodes only, n=1023): 34.5 ms → 0.64 ms (54× faster) The fix applies via `_get_colors`, which is shared with `pl.branches`, so branch coloring benefits as well. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b13c4e1ff7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T19:43:34Z

src/pycea/pl/_utils.py

+        # Vectorized: reindex to align with indicies (NaN for missing), then apply colormap in bulk
+        values = data.reindex(indicies)
+        color_map.set_bad(na_color)
+        colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float))))


Avoid treating ±inf as missing in numeric color mapping

Using np.ma.masked_invalid(...) here masks both NaN and ±inf, so any infinite numeric values are now rendered with na_color instead of being mapped through the continuous colormap. This changes result semantics for workflows where overflow/div-by-zero values appear in node/edge metrics (they become visually indistinguishable from truly missing values), whereas the previous per-element path only treated absent indices as missing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b13c4e1ff7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T19:44:59Z

src/pycea/pl/_utils.py

+        # Vectorized: reindex to align with indicies (NaN for missing), then apply colormap in bulk
+        values = data.reindex(indicies)
+        color_map.set_bad(na_color)
+        colors = color_map(norm(np.ma.masked_invalid(values.values.astype(float))))


Preserve infinite numeric values during color vectorization

Using np.ma.masked_invalid(...) here treats +/-inf as missing values, not just absent/NaN entries. In the previous path, infinite numeric values were still passed through Normalize and mapped via the colormap (typically to over/under extremes), but this change forces them to na_color, which can silently miscolor real overflow/outlier data in pl.nodes()/pl.branches() when annotations contain infinities.

Useful? React with 👍 / 👎.

_get_colors now returns an N×4 numpy array for continuous data instead of a list of per-element tuples/strings. Update the test assertions accordingly: - isinstance check: list → np.ndarray - na_color check: string equality → np.testing.assert_allclose against mcolors.to_rgba Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-02-25T20:04:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.69%. Comparing base (2139a8e) to head (ec2bc80).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #48   +/-   ##
=======================================
  Coverage   92.69%   92.69%           
=======================================
  Files          34       34           
  Lines        2450     2452    +2     
=======================================
+ Hits         2271     2273    +2     
  Misses        179      179

Files with missing lines	Coverage Δ
src/pycea/pl/_utils.py	`89.69% <100.00%> (+0.08%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

colganwi merged commit 883a6ff into main Feb 25, 2026
8 checks passed

colganwi deleted the fix/nodes-continuous-color-perf branch February 25, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize continuous color computation in pl.nodes() for large speedup#48

Vectorize continuous color computation in pl.nodes() for large speedup#48
colganwi merged 2 commits intomainfrom
fix/nodes-continuous-color-perf

colganwi commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

codecov bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

colganwi commented Feb 25, 2026

Summary

Root cause

Performance

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Feb 25, 2026 •

edited

Loading