docs: document year=-1/month=-1 sentinel values for lab cross samples#1104
Open
khushthecoder wants to merge 1 commit intomalariagen:masterfrom
Open
docs: document year=-1/month=-1 sentinel values for lab cross samples#1104khushthecoder wants to merge 1 commit intomalariagen:masterfrom
khushthecoder wants to merge 1 commit intomalariagen:masterfrom
Conversation
Closes malariagen#1092 Add a notes section to the sample_metadata() @doc decorator explaining that some samples are lab crosses (mosquitoes bred in the laboratory) that use year=-1 and month=-1 as sentinel values for 'no collection date'. Include an example showing how to filter them out using sample_query. This is a documentation-only change — no runtime behavior is modified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
docs: document
year=-1/month=-1sentinel values for lab cross samplesCloses #1092
Summary
Add a
notessection to thesample_metadata()docstring explaining theyear=-1andmonth=-1sentinel convention for lab cross samples (mosquitoes bred in the laboratory with no real collection date), along with a code example showing how to filter them out.This is a documentation-only change — no runtime behavior is modified.
Background
Lab cross samples in the dataset use
year=-1andmonth=-1as sentinel values meaning "no real collection date exists." This convention is internally consistent (e.g.,quarteris also set to-1whenmonth == -1, andplot_samples_bar()filtersyear > 0), but it is not documented in the public API. A user encountering the data for the first time has no way to know that-1is a sentinel value from the Python API alone.This can lead to:
ValueErrorwhen attempting date arithmetic (e.g.,pd.to_datetime(df["year"].astype(str) + "-01-01"))NaNvalues in cohort metadata columns for lab crosses (expected, but confusing without context)What Changed
malariagen_data/anoph/sample_metadata.pyAdded a
notessection to the@docdecorator onsample_metadata():What Did NOT Change
sample_queryorsample_sets)_parse_general_metadata(),general_metadata(), or any other methodDesign Rationale
Per maintainer feedback, adding an
exclude_lab_crossesparameter would be redundant sincesample_query="year >= 0"already accomplishes this. A runtime warning would be noisy for experienced researchers. The right fix is documentation — making the sentinel convention discoverable through the API docs.