perf: reuse cohort allele counts in pairwise_average_fst by kshirajahere · Pull Request #1102 · malariagen/malariagen-data-python

kshirajahere · 2026-03-11T17:56:29Z

Summary

This improves pairwise_average_fst() by reusing per-cohort allele counts instead of recomputing them for every cohort pair.

Before this change, pairwise_average_fst() delegated each pair to average_fst(), which meant the same cohort query could trigger repeated snp_allele_counts() work across many pairs. With N cohorts that grows to N(N-1) allele-count computations.

After this change, pairwise_average_fst() computes allele counts once per cohort for the given parameter set, then reuses those arrays for the pairwise Hudson Fst calculations. average_fst() behaviour is unchanged.

Fixes #692.

Why this matters

pairwise_average_fst() is a core analysis path for exploratory population comparisons. Improving its runtime helps interactive use in notebooks/Colab and makes higher-level workflows more responsive.

Changes

factor the shared blockwise Hudson Fst logic into _average_fst_from_allele_counts()
keep average_fst() as the public two-cohort entry point, now calling the shared helper
precompute cohort allele counts once inside pairwise_average_fst() and reuse them across all cohort pairs
add a regression test proving pairwise_average_fst() still matches average_fst() for each pair
add a regression test proving each cohort query only triggers one snp_allele_counts() call within pairwise_average_fst()

Tests run

ruff check malariagen_data/anoph/fst.py tests/anoph/test_fst.py
ruff format --check malariagen_data/anoph/fst.py tests/anoph/test_fst.py
pytest tests/anoph/test_fst.py -q

Notes

I did not run the full repository test suite.
pytest tests/anoph/test_fst.py -q passed, with pre-existing warnings from the simulated adir1 fixtures about missing surveillance flags metadata.

perf: reuse cohort allele counts in pairwise_average_fst

87c305a

kshirajahere mentioned this pull request Mar 11, 2026

Pairwise FST function running time #692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reuse cohort allele counts in pairwise_average_fst#1102

perf: reuse cohort allele counts in pairwise_average_fst#1102
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere:GH692-optimize-pairwise-average-fst

kshirajahere commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kshirajahere commented Mar 11, 2026

Summary

Why this matters

Changes

Tests run

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant