Skip to content

fix: preserve sample_query_options during cohort normalization#1101

Open
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere:GH1100-preserve-sample-query-options-cohort-normalisation
Open

fix: preserve sample_query_options during cohort normalization#1101
kshirajahere wants to merge 1 commit intomalariagen:masterfrom
kshirajahere:GH1100-preserve-sample-query-options-cohort-normalisation

Conversation

@kshirajahere
Copy link

@kshirajahere kshirajahere commented Mar 11, 2026

Summary

Before this change, sample_query_options could still be lost inside shared cohort normalisation even after the recent caller-level forwarding fixes. Any multi-cohort path that rebuilt cohort queries and then rechecked cohort sizes would fail for valid parameterised pandas queries such as country in @countries_list.

After this change, cohort normalisation preserves the full query context, and the shared query helpers now treat engine="python" as a default rather than forcing it as a duplicate keyword.

Closes #1100.

Why this matters

This is a framework-level correctness fix, not just a surface patch:

  • pairwise_average_fst() was still broken for local_dict-backed filters because _setup_cohort_queries() re-applied the combined cohort query without sample_query_options.
  • Multi-cohort H12 plotting had the same underlying problem for variable-backed sample filters.
  • Once query options are forwarded correctly, sample_metadata() and _filter_sample_dataset() must also accept an explicit engine option without raising TypeError.

The net effect is that the public sample_query_options contract now behaves consistently across direct metadata selection and higher-level cohort analyses.

Exact changes

  • add _prep_sample_query_options() in AnophelesBase to preserve user-supplied query context while defaulting the engine to python
  • use that helper in shared query-evaluation paths instead of passing engine="python" as a duplicate keyword
  • pass sample_query_options through _setup_cohort_queries() when validating derived cohort queries
  • add regressions for:
    • direct sample_metadata() queries using both engine and local_dict
    • pairwise_average_fst() with local_dict-backed sample_query
    • multi-cohort H12 plotting with local_dict-backed sample_query

Tests run

  • ruff check malariagen_data/anoph/base.py malariagen_data/anoph/sample_metadata.py tests/anoph/test_fst.py tests/anoph/test_h12.py tests/anoph/test_sample_metadata.py
  • pytest tests/anoph/test_h12.py tests/anoph/test_fst.py -q
  • pytest tests/anoph/test_sample_metadata.py -k 'sample_metadata_with_query or sample_metadata_with_indices or query_options' -q

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve sample_query_options during cohort query normalisation

1 participant