Skip to content

Conversation

@suriyasureshok
Copy link

@suriyasureshok suriyasureshok commented Feb 12, 2026

Distance analysis

Changes made in this Pull Request:

  • Added min_distance(A, B, box=None) as a convenience function to compute the minimum pairwise distance between two AtomGroups using existing MDAnalysis distance utilities.
  • Added distance_statistics(A, B, box=None) to compute basic summary statistics (min, max, mean, std, number of distances) over all pairwise distances between two AtomGroups.
  • Reused MDAnalysis.lib.distances.distance_array to ensure consistency with existing distance calculations and periodic boundary condition handling.
  • Added unit tests validating numerical correctness, return types, identical groups, single-atom groups, and basic PBC behavior.

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)
  • LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5231.org.readthedocs.build/en/5231/

Copilot AI review requested due to automatic review settings February 12, 2026 13:00
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suriyasureshok it looks like just reopened the same PR but changing your answer from yes to no on the AI usage question?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds convenience APIs to MDAnalysis.analysis.distances for summarizing pairwise distances between two AtomGroups, plus accompanying tests and minor repo hygiene updates.

Changes:

  • Added min_distance(A, B, box=None) to compute the minimum pairwise distance using MDAnalysis.lib.distances.distance_array.
  • Added distance_statistics(A, B, box=None) to return summary stats (min/max/mean/std and count) over all pairwise distances.
  • Added tests for the new APIs; updated AUTHORS and .gitignore.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.

File Description
package/MDAnalysis/analysis/distances.py Adds min_distance and distance_statistics to the analysis distance API and exports them.
testsuite/MDAnalysisTests/analysis/test_distances.py Introduces new unit tests for min_distance and distance_statistics.
package/AUTHORS Adds a new author entry.
.gitignore Ignores venv/ virtual environment directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Total number of pairwise distances calculated

.. versionadded:: 2.8.0
"""
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both min_distance() and distance_statistics() will raise a generic NumPy reduction error if either AtomGroup is empty (because distance_array yields an empty array and np.min/np.max/... cannot reduce it). Consider adding an explicit check (e.g., A.n_atoms == 0 or B.n_atoms == 0) and raising a clear ValueError explaining that distances are undefined for empty groups.

Suggested change
"""
"""
# Distances are undefined if either AtomGroup is empty; avoid NumPy
# reduction errors on empty distance arrays and raise a clear exception.
if getattr(A, "n_atoms", None) == 0 or getattr(B, "n_atoms", None) == 0:
raise ValueError(
"distance_statistics() requires non-empty AtomGroups; "
f"got A.n_atoms={getattr(A, 'n_atoms', 'N/A')}, "
f"B.n_atoms={getattr(B, 'n_atoms', 'N/A')}."
)

Copilot uses AI. Check for mistakes.
"max": float(np.max(distances)),
"mean": float(np.mean(distances)),
"std": float(np.std(distances)),
"n_distances": distances.size,
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distance_statistics() documents 'n_distances' as an int, but currently returns distances.size which is a NumPy integer type (np.intp). Cast it to int to keep the return types consistent and JSON/serialization-friendly.

Suggested change
"n_distances": distances.size,
"n_distances": int(distances.size),

Copilot uses AI. Check for mistakes.
- 'n_distances' : int
Total number of pairwise distances calculated

.. versionadded:: 2.8.0
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings mark these functions as .. versionadded:: 2.8.0, but the repository version is currently 2.11.0-dev0 (next release 2.11.0). The versionadded tag should reflect the release where this API will first appear (likely 2.11.0).

Suggested change
.. versionadded:: 2.8.0
.. versionadded:: 2.11.0

Copilot uses AI. Check for mistakes.
Comment on lines +341 to +348
def test_distance_statistics_simple_case(self, ag, ag2, expected):
"""Test MDAnalysis.analysis.distances.distance_statistics() for
a simple input case. Checks returned distance statistics
against expected values."""
actual = MDAnalysis.analysis.distances.distance_statistics(ag, ag2)
for key in expected:
assert_allclose(actual[key], expected[key])

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests for distance_statistics() never assert the 'n_distances' field that the function returns (and the PR description mentions validating the number of distances). Add an assertion that actual['n_distances'] == ag.n_atoms * ag2.n_atoms (and consider also checking it’s an int).

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +295
def test_min_distance_box(self, ag, ag2, box):
"""Test that MDAnalysis.analysis.distances.min_distance()
correctly accounts for periodic boundary conditions."""
actual = MDAnalysis.analysis.distances.min_distance(ag, ag2, box=box)
assert isinstance(actual, float)
assert actual >= 0.0
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_min_distance_box doesn't actually verify that periodic boundary conditions are applied correctly; it only checks the result is a non-negative float, which would also pass if PBC handling were broken or ignored. Consider computing an explicit expected minimum-image distance (similar to expected_box in TestDist) and asserting numerical correctness.

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +360
def test_distance_statistics_box(self, ag, ag2, box):
"""Test that MDAnalysis.analysis.distances.distance_statistics()
works correctly when a box is provided."""
actual = MDAnalysis.analysis.distances.distance_statistics(ag, ag2, box=box)
assert isinstance(actual, dict)
assert all(key in actual for key in ['min', 'max', 'mean', 'std'])
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_distance_statistics_box only checks key presence and return type; it doesn't validate that the statistics change appropriately under PBC (or match an explicit minimum-image computation). This falls short of the PR description’s claim of testing basic PBC behavior for the new APIs; consider adding a numeric assertion for at least one statistic under box=....

Copilot uses AI. Check for mistakes.
.. versionadded:: 2.8.0
"""
distances = distance_array(A.positions, B.positions, box=box)
return np.min(distances)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_distance() returns np.min(distances), which is a NumPy scalar (e.g., np.float32/np.float64) rather than a built-in float. This can break the documented return type and the added test that checks isinstance(actual, float); cast the result to float before returning for a stable public API.

Suggested change
return np.min(distances)
return float(np.min(distances))

Copilot uses AI. Check for mistakes.
@suriyasureshok
Copy link
Author

Hi @IAlibay,
I initially misunderstood the question. I thought it referred to using AI features within the IDE, rather than using an AI system to generate the code itself.

After realizing this, I updated my response to accurately reflect my usage.

I believe the two functions added to distance.py provide useful, commonly needed distance analysis functionality while reusing existing MDAnalysis infrastructure. I’d appreciate your review and any feedback you may have.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suriyasureshok can you tell us why these methods are useful to YOU. i.e. why would you use an extra wrapper around the distance_array return rather than just operating on the numpy array with numpy utilities?

@suriyasureshok
Copy link
Author

@IAlibay These methods just wrap very common operations people already do with distance_array, like taking the minimum distance or a few basic statistics. Writing those reductions by hand every time leads to repeated boilerplate and makes the analysis code harder to read.

Having small helpers keeps the intent clear and ensures everyone uses the same, PBC-aware logic, while distance_array is still there when the full matrix is actually needed.

@IAlibay
Copy link
Member

IAlibay commented Feb 12, 2026

@suriyasureshok can you point me to existing code of yours where you have had to use these methods and it's been a ton of exessive boilerplate or are you assuming what users might want?

@suriyasureshok
Copy link
Author

@IAlibay I don’t have existing code I can point to. This PR wasn’t based on a specific prior implementation, but on what felt like a small convenience around distance_array for common reduction cases.

If that’s not a strong enough reason to add a new public API, I understand and I’m fine with closing the PR. I appreciate the feedback.

@suriyasureshok
Copy link
Author

@IAlibay I don’t have existing code I can point to. This PR wasn’t based on a specific prior implementation, but on what felt like a small convenience around distance_array for common reduction cases.

If that’s not a strong enough reason to add a new public API, I understand and I’m fine with closing the PR. I appreciate the feedback.

@IAlibay
Copy link
Member

IAlibay commented Feb 12, 2026

@suriyasureshok - my take here is that I can't really see the usefulness of expanding our API to handle numpy one line methods.

I will give it a couple of days for any @MDAnalysis/coredevs to weigh in if you disagree.

@suriyasureshok
Copy link
Author

@IAlibay Thanks for the clarification. That makes sense. I understand the concern about expanding the public API for functionality that can already be expressed concisely with NumPy.

I’m happy to wait for any additional input from other core devs, and if the consensus is that this doesn’t fit the project’s direction, I’m fine with closing the PR. I appreciate the feedback and will take it into account for future contributions.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like enough additional functionality to justify the additional code.

It would be better to write documentation (eg in the User Guide) to show how users can implement this functionality themselves.

I don’t recommend going forward with this PR. sorry!

@suriyasureshok
Copy link
Author

This does not look like enough additional functionality to justify the additional code.

It would be better to write documentation (eg in the User Guide) to show how users can implement this functionality themselves.

I don’t recommend going forward with this PR. sorry!

I respect it. I will close this PR and will come up with better features in the upcoming PRs. Thank you for your time.

@orbeckst
Copy link
Member

Thank you for your contribution @suriyasureshok . With new features it's not easy as we have to be careful what is added. A better approach for feature is to raise an issue and first discuss there.

@suriyasureshok
Copy link
Author

Thank you for your contribution @suriyasureshok . With new features it's not easy as we have to be careful what is added. A better approach for feature is to raise an issue and first discuss there.

Yeah, I understand. I'll rectify this in my next PR. Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants