fix: preserve sample_query_options during cohort normalization#1101
Open
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Before this change,
sample_query_optionscould still be lost inside shared cohort normalisation even after the recent caller-level forwarding fixes. Any multi-cohort path that rebuilt cohort queries and then rechecked cohort sizes would fail for valid parameterised pandas queries such ascountry in @countries_list.After this change, cohort normalisation preserves the full query context, and the shared query helpers now treat
engine="python"as a default rather than forcing it as a duplicate keyword.Closes #1100.
Why this matters
This is a framework-level correctness fix, not just a surface patch:
pairwise_average_fst()was still broken forlocal_dict-backed filters because_setup_cohort_queries()re-applied the combined cohort query withoutsample_query_options.sample_metadata()and_filter_sample_dataset()must also accept an explicitengineoption without raisingTypeError.The net effect is that the public
sample_query_optionscontract now behaves consistently across direct metadata selection and higher-level cohort analyses.Exact changes
_prep_sample_query_options()inAnophelesBaseto preserve user-supplied query context while defaulting the engine topythonengine="python"as a duplicate keywordsample_query_optionsthrough_setup_cohort_queries()when validating derived cohort queriessample_metadata()queries using bothengineandlocal_dictpairwise_average_fst()withlocal_dict-backedsample_querylocal_dict-backedsample_queryTests run
ruff check malariagen_data/anoph/base.py malariagen_data/anoph/sample_metadata.py tests/anoph/test_fst.py tests/anoph/test_h12.py tests/anoph/test_sample_metadata.pypytest tests/anoph/test_h12.py tests/anoph/test_fst.py -qpytest tests/anoph/test_sample_metadata.py -k 'sample_metadata_with_query or sample_metadata_with_indices or query_options' -q