perf: reuse cohort allele counts in pairwise_average_fst#1102
Open
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
Open
perf: reuse cohort allele counts in pairwise_average_fst#1102kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This improves
pairwise_average_fst()by reusing per-cohort allele counts instead of recomputing them for every cohort pair.Before this change,
pairwise_average_fst()delegated each pair toaverage_fst(), which meant the same cohort query could trigger repeatedsnp_allele_counts()work across many pairs. WithNcohorts that grows toN(N-1)allele-count computations.After this change,
pairwise_average_fst()computes allele counts once per cohort for the given parameter set, then reuses those arrays for the pairwise Hudson Fst calculations.average_fst()behaviour is unchanged.Fixes #692.
Why this matters
pairwise_average_fst()is a core analysis path for exploratory population comparisons. Improving its runtime helps interactive use in notebooks/Colab and makes higher-level workflows more responsive.Changes
_average_fst_from_allele_counts()average_fst()as the public two-cohort entry point, now calling the shared helperpairwise_average_fst()and reuse them across all cohort pairspairwise_average_fst()still matchesaverage_fst()for each pairsnp_allele_counts()call withinpairwise_average_fst()Tests run
ruff check malariagen_data/anoph/fst.py tests/anoph/test_fst.pyruff format --check malariagen_data/anoph/fst.py tests/anoph/test_fst.pypytest tests/anoph/test_fst.py -qNotes
pytest tests/anoph/test_fst.py -qpassed, with pre-existing warnings from the simulatedadir1fixtures about missing surveillance flags metadata.