Skip to content

perf: reuse cohort allele counts in pairwise_average_fst#1102

Open
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere:GH692-optimize-pairwise-average-fst
Open

perf: reuse cohort allele counts in pairwise_average_fst#1102
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere:GH692-optimize-pairwise-average-fst

Conversation

@kshirajahere
Copy link

Summary

This improves pairwise_average_fst() by reusing per-cohort allele counts instead of recomputing them for every cohort pair.

Before this change, pairwise_average_fst() delegated each pair to average_fst(), which meant the same cohort query could trigger repeated snp_allele_counts() work across many pairs. With N cohorts that grows to N(N-1) allele-count computations.

After this change, pairwise_average_fst() computes allele counts once per cohort for the given parameter set, then reuses those arrays for the pairwise Hudson Fst calculations. average_fst() behaviour is unchanged.

Fixes #692.

Why this matters

pairwise_average_fst() is a core analysis path for exploratory population comparisons. Improving its runtime helps interactive use in notebooks/Colab and makes higher-level workflows more responsive.

Changes

  • factor the shared blockwise Hudson Fst logic into _average_fst_from_allele_counts()
  • keep average_fst() as the public two-cohort entry point, now calling the shared helper
  • precompute cohort allele counts once inside pairwise_average_fst() and reuse them across all cohort pairs
  • add a regression test proving pairwise_average_fst() still matches average_fst() for each pair
  • add a regression test proving each cohort query only triggers one snp_allele_counts() call within pairwise_average_fst()

Tests run

  • ruff check malariagen_data/anoph/fst.py tests/anoph/test_fst.py
  • ruff format --check malariagen_data/anoph/fst.py tests/anoph/test_fst.py
  • pytest tests/anoph/test_fst.py -q

Notes

  • I did not run the full repository test suite.
  • pytest tests/anoph/test_fst.py -q passed, with pre-existing warnings from the simulated adir1 fixtures about missing surveillance flags metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pairwise FST function running time

1 participant