Fix NotFittedError in umap_embedding when fewer than 2 rows have nonzero variance#19
Conversation
…tant rows exist Co-authored-by: lappalainenj <34949352+lappalainenj@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes an edge case in umap_embedding where UMAP was never fit (due to too few non-constant rows), but downstream logic still assumed a fitted reducer—previously resulting in NotFittedError.
Changes:
- Add an early-return guard in
umap_embeddingto return an all-NaN embedding andreducer=Nonewhen fewer than 2 non-constant rows remain. - Update
umap_embeddingdocstring to document thereducer=None/ all-NaN embedding behavior. - Add unit tests covering all-constant input and single non-constant-row input.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
flyvis/analysis/clustering.py |
Adds early return for insufficient non-constant rows; updates docstring to describe the new return behavior. |
tests/test_clustering.py |
Adds regression tests for the insufficient-data edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
flyvis/analysis/clustering.py
Outdated
| - embedding: The UMAP embedding (n_samples, n_components). May be NaN | ||
| if insufficient data. | ||
| - mask: Boolean mask (length n_samples) showing which rows had nonzero | ||
| variance and were connected. | ||
| - reducer: The fitted UMAP object or None if insufficient data. |
There was a problem hiding this comment.
The docstring now says reducer may be None, but the function’s return type annotation still indicates a non-optional UMAP. Please update the type hint to reflect the new behavior (e.g., Optional[UMAP]) so static type checking matches runtime behavior. Also consider clarifying the mask semantics in the reducer is None early-return case (it currently reflects non-constant rows, not connectivity).
There was a problem hiding this comment.
Done in 512596e. Updated the return type to Optional[UMAP] and expanded the mask docstring entry to clarify the semantics in both the early-return case (only non-constant rows are True) and the normal case (non-constant rows that were also connected in the UMAP graph).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tics in docstring Co-authored-by: lappalainenj <34949352+lappalainenj@users.noreply.github.com>
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
umap_embeddinginflyvis/analysis/clustering.pyto handle the case where fewer than 2 rows have non-zero variance (return early with NaN embedding andNonereducer)Optional[UMAP]masksemantics in docstring for the early-return casetests/test_clustering.pyfor the edge cases (all 3 pass)Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.