-
Notifications
You must be signed in to change notification settings - Fork 781
Distance analysis functions #5231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suriyasureshok it looks like just reopened the same PR but changing your answer from yes to no on the AI usage question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds convenience APIs to MDAnalysis.analysis.distances for summarizing pairwise distances between two AtomGroups, plus accompanying tests and minor repo hygiene updates.
Changes:
- Added
min_distance(A, B, box=None)to compute the minimum pairwise distance usingMDAnalysis.lib.distances.distance_array. - Added
distance_statistics(A, B, box=None)to return summary stats (min/max/mean/std and count) over all pairwise distances. - Added tests for the new APIs; updated
AUTHORSand.gitignore.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
package/MDAnalysis/analysis/distances.py |
Adds min_distance and distance_statistics to the analysis distance API and exports them. |
testsuite/MDAnalysisTests/analysis/test_distances.py |
Introduces new unit tests for min_distance and distance_statistics. |
package/AUTHORS |
Adds a new author entry. |
.gitignore |
Ignores venv/ virtual environment directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Total number of pairwise distances calculated | ||
|
|
||
| .. versionadded:: 2.8.0 | ||
| """ |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both min_distance() and distance_statistics() will raise a generic NumPy reduction error if either AtomGroup is empty (because distance_array yields an empty array and np.min/np.max/... cannot reduce it). Consider adding an explicit check (e.g., A.n_atoms == 0 or B.n_atoms == 0) and raising a clear ValueError explaining that distances are undefined for empty groups.
| """ | |
| """ | |
| # Distances are undefined if either AtomGroup is empty; avoid NumPy | |
| # reduction errors on empty distance arrays and raise a clear exception. | |
| if getattr(A, "n_atoms", None) == 0 or getattr(B, "n_atoms", None) == 0: | |
| raise ValueError( | |
| "distance_statistics() requires non-empty AtomGroups; " | |
| f"got A.n_atoms={getattr(A, 'n_atoms', 'N/A')}, " | |
| f"B.n_atoms={getattr(B, 'n_atoms', 'N/A')}." | |
| ) |
| "max": float(np.max(distances)), | ||
| "mean": float(np.mean(distances)), | ||
| "std": float(np.std(distances)), | ||
| "n_distances": distances.size, |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distance_statistics() documents 'n_distances' as an int, but currently returns distances.size which is a NumPy integer type (np.intp). Cast it to int to keep the return types consistent and JSON/serialization-friendly.
| "n_distances": distances.size, | |
| "n_distances": int(distances.size), |
| - 'n_distances' : int | ||
| Total number of pairwise distances calculated | ||
|
|
||
| .. versionadded:: 2.8.0 |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstrings mark these functions as .. versionadded:: 2.8.0, but the repository version is currently 2.11.0-dev0 (next release 2.11.0). The versionadded tag should reflect the release where this API will first appear (likely 2.11.0).
| .. versionadded:: 2.8.0 | |
| .. versionadded:: 2.11.0 |
| def test_distance_statistics_simple_case(self, ag, ag2, expected): | ||
| """Test MDAnalysis.analysis.distances.distance_statistics() for | ||
| a simple input case. Checks returned distance statistics | ||
| against expected values.""" | ||
| actual = MDAnalysis.analysis.distances.distance_statistics(ag, ag2) | ||
| for key in expected: | ||
| assert_allclose(actual[key], expected[key]) | ||
|
|
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tests for distance_statistics() never assert the 'n_distances' field that the function returns (and the PR description mentions validating the number of distances). Add an assertion that actual['n_distances'] == ag.n_atoms * ag2.n_atoms (and consider also checking it’s an int).
| def test_min_distance_box(self, ag, ag2, box): | ||
| """Test that MDAnalysis.analysis.distances.min_distance() | ||
| correctly accounts for periodic boundary conditions.""" | ||
| actual = MDAnalysis.analysis.distances.min_distance(ag, ag2, box=box) | ||
| assert isinstance(actual, float) | ||
| assert actual >= 0.0 |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_min_distance_box doesn't actually verify that periodic boundary conditions are applied correctly; it only checks the result is a non-negative float, which would also pass if PBC handling were broken or ignored. Consider computing an explicit expected minimum-image distance (similar to expected_box in TestDist) and asserting numerical correctness.
| def test_distance_statistics_box(self, ag, ag2, box): | ||
| """Test that MDAnalysis.analysis.distances.distance_statistics() | ||
| works correctly when a box is provided.""" | ||
| actual = MDAnalysis.analysis.distances.distance_statistics(ag, ag2, box=box) | ||
| assert isinstance(actual, dict) | ||
| assert all(key in actual for key in ['min', 'max', 'mean', 'std']) |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_distance_statistics_box only checks key presence and return type; it doesn't validate that the statistics change appropriately under PBC (or match an explicit minimum-image computation). This falls short of the PR description’s claim of testing basic PBC behavior for the new APIs; consider adding a numeric assertion for at least one statistic under box=....
| .. versionadded:: 2.8.0 | ||
| """ | ||
| distances = distance_array(A.positions, B.positions, box=box) | ||
| return np.min(distances) |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_distance() returns np.min(distances), which is a NumPy scalar (e.g., np.float32/np.float64) rather than a built-in float. This can break the documented return type and the added test that checks isinstance(actual, float); cast the result to float before returning for a stable public API.
| return np.min(distances) | |
| return float(np.min(distances)) |
|
Hi @IAlibay, After realizing this, I updated my response to accurately reflect my usage. I believe the two functions added to distance.py provide useful, commonly needed distance analysis functionality while reusing existing MDAnalysis infrastructure. I’d appreciate your review and any feedback you may have. |
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suriyasureshok can you tell us why these methods are useful to YOU. i.e. why would you use an extra wrapper around the distance_array return rather than just operating on the numpy array with numpy utilities?
|
@IAlibay These methods just wrap very common operations people already do with distance_array, like taking the minimum distance or a few basic statistics. Writing those reductions by hand every time leads to repeated boilerplate and makes the analysis code harder to read. Having small helpers keeps the intent clear and ensures everyone uses the same, PBC-aware logic, while distance_array is still there when the full matrix is actually needed. |
|
@suriyasureshok can you point me to existing code of yours where you have had to use these methods and it's been a ton of exessive boilerplate or are you assuming what users might want? |
|
@IAlibay I don’t have existing code I can point to. This PR wasn’t based on a specific prior implementation, but on what felt like a small convenience around distance_array for common reduction cases. If that’s not a strong enough reason to add a new public API, I understand and I’m fine with closing the PR. I appreciate the feedback. |
|
@IAlibay I don’t have existing code I can point to. This PR wasn’t based on a specific prior implementation, but on what felt like a small convenience around distance_array for common reduction cases. If that’s not a strong enough reason to add a new public API, I understand and I’m fine with closing the PR. I appreciate the feedback. |
|
@suriyasureshok - my take here is that I can't really see the usefulness of expanding our API to handle numpy one line methods. I will give it a couple of days for any @MDAnalysis/coredevs to weigh in if you disagree. |
|
@IAlibay Thanks for the clarification. That makes sense. I understand the concern about expanding the public API for functionality that can already be expressed concisely with NumPy. I’m happy to wait for any additional input from other core devs, and if the consensus is that this doesn’t fit the project’s direction, I’m fine with closing the PR. I appreciate the feedback and will take it into account for future contributions. |
orbeckst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look like enough additional functionality to justify the additional code.
It would be better to write documentation (eg in the User Guide) to show how users can implement this functionality themselves.
I don’t recommend going forward with this PR. sorry!
I respect it. I will close this PR and will come up with better features in the upcoming PRs. Thank you for your time. |
|
Thank you for your contribution @suriyasureshok . With new features it's not easy as we have to be careful what is added. A better approach for feature is to raise an issue and first discuss there. |
Yeah, I understand. I'll rectify this in my next PR. Thank you for your time. |
Distance analysis
Changes made in this Pull Request:
min_distance(A, B, box=None)as a convenience function to compute the minimum pairwise distance between two AtomGroups using existing MDAnalysis distance utilities.distance_statistics(A, B, box=None)to compute basic summary statistics (min, max, mean, std, number of distances) over all pairwise distances between two AtomGroups.MDAnalysis.lib.distances.distance_arrayto ensure consistency with existing distance calculations and periodic boundary condition handling.LLM / AI generated code disclosure
LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no
PR Checklist
package/CHANGELOGfile updated?package/AUTHORS? (If it is not, add it!)Developers Certificate of Origin
I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.
📚 Documentation preview 📚: https://mdanalysis--5231.org.readthedocs.build/en/5231/