Skip to content

Version 2.0#252

Merged
JoshLoecker merged 1574 commits intomainfrom
v2
Mar 6, 2026
Merged

Version 2.0#252
JoshLoecker merged 1574 commits intomainfrom
v2

Conversation

@JoshLoecker
Copy link
Member

@JoshLoecker JoshLoecker commented Mar 6, 2026

This is a massive pull request that contains a huge variety of features, improvements, bug fixes, and changes. While these are too numerous to list, the main components include:

  • Per-data-type input and output controls
    • It is now possible to specify exactly where input and output files should be read from and saved to
    • This adds some complexity and verbose-ness to each function, but significantly reduces COMO's strictness for file-level organization
  • Refactored iMAT integration to fix negative-values being placed in middle-expression bin
  • Fixed FPKM calculations to be on a per-gene basis instead of per-sample
  • Use Stouffer's method to integrate z-scores
  • Better handling of incorrect data from remote APIs (e.g., BioDBNet and MyGene.info)
  • Added integration with Salmon gene quantification and its TPM calculations
  • Added and updated type hints on significant portions of public (and most private) APIs for easier understanding of what is needed for input and output files/data
  • Better error and warning messages, plus better error handling

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
JoshLoecker and others added 27 commits October 2, 2025 12:20
Co-authored-by: SarahNakamura <snakamura@unomaha.edu>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…dicates an incorrect taxon id

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
# Conflicts:
#	main/como/rnaseq_gen.py
#	main/como/rnaseq_preprocess.py
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
* fix(fpkm): update imports for zFPKM calculation improvements

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(fpkm): use Salmon quantification instead of STAR quantification

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: fill with integers for faster processing

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove unnecessary async function usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: remove non existant genes from conversion

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: use more explicit (albeit longer) code to create gene_info dataframe object

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: import required modules

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: optional argument for fragment data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: improve handling for single cell data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: generalize data type input

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: simplify FPKM/RPKM calculations; properly compute per-gene FPKM scores

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: move zfpkm calculation to external package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use np.bool for boolean array

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow setting negative zFPKM results to 0

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: simplification to use external zfpkm package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow providing the fragment size filepath (from rnaseq preprocessing)

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): reduce max line length

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): mark unsorted imports as fixable

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): lock pyproject file

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: rename count to quant in testing files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: test new quant information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use quant files instead of strand files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: updated COMO_input files for naiveB to use updated FastqToGeneCounts information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: added Salmon quantification data for naive B

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use `_read_file` function to read data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): remove 1 from expected gene names to fix header

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): use `endswith` instead of `is in`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): Use missing file appropriately

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): Use dependency groups

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* revert: use synchronous programming for more deterministic usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
* fix(fpkm): update imports for zFPKM calculation improvements

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(fpkm): use Salmon quantification instead of STAR quantification

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: fill with integers for faster processing

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove unnecessary async function usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: remove non existant genes from conversion

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: use more explicit (albeit longer) code to create gene_info dataframe object

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: import required modules

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: optional argument for fragment data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: improve handling for single cell data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: generalize data type input

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: simplify FPKM/RPKM calculations; properly compute per-gene FPKM scores

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: move zfpkm calculation to external package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use np.bool for boolean array

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow setting negative zFPKM results to 0

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: simplification to use external zfpkm package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow providing the fragment size filepath (from rnaseq preprocessing)

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): reduce max line length

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): mark unsorted imports as fixable

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): lock pyproject file

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: rename count to quant in testing files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: add single cell normalization using scanpy defaults

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: test new quant information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: test new quant information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use quant files instead of strand files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use quant files instead of strand files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: updated COMO_input files for naiveB to use updated FastqToGeneCounts information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: added Salmon quantification data for naive B

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use `_read_file` function to read data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): remove 1 from expected gene names to fix header

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): use `endswith` instead of `is in`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): Use missing file appropriately

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): Use dependency groups

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* revert: use synchronous programming for more deterministic usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
* fix(fpkm): update imports for zFPKM calculation improvements

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(fpkm): use Salmon quantification instead of STAR quantification

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: fill with integers for faster processing

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove unnecessary async function usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: remove non existant genes from conversion

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: use more explicit (albeit longer) code to create gene_info dataframe object

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: import required modules

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: optional argument for fragment data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: improve handling for single cell data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: generalize data type input

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: simplify FPKM/RPKM calculations; properly compute per-gene FPKM scores

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: move zfpkm calculation to external package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use np.bool for boolean array

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow setting negative zFPKM results to 0

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: simplification to use external zfpkm package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow providing the fragment size filepath (from rnaseq preprocessing)

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): reduce max line length

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): mark unsorted imports as fixable

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): lock pyproject file

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: rename count to quant in testing files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: add single cell normalization using scanpy defaults

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: test new quant information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: test new quant information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use quant files instead of strand files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use quant files instead of strand files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: updated COMO_input files for naiveB to use updated FastqToGeneCounts information

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: added Salmon quantification data for naive B

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use `_read_file` function to read data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): remove 1 from expected gene names to fix header

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): use `endswith` instead of `is in`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(tests): Use missing file appropriately

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): Use dependency groups

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* revert: use synchronous programming for more deterministic usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(type): fix pyrefly type errors

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(type): fix ruff & pyrefly issues

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: rename `_log_and_raise_error` to `log_and_raise_error`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: rename `_read_file` to `read_file`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
* fix(fpkm): update imports for zFPKM calculation improvements

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix(fpkm): use Salmon quantification instead of STAR quantification

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: fill with integers for faster processing

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove unnecessary async function usage

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* fix: remove non existant genes from conversion

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: use more explicit (albeit longer) code to create gene_info dataframe object

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: import required modules

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: optional argument for fragment data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: improve handling for single cell data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: generalize data type input

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: simplify FPKM/RPKM calculations; properly compute per-gene FPKM scores

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: move zfpkm calculation to external package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use np.bool for boolean array

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: ruff formatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow setting negative zFPKM results to 0

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: simplification to use external zfpkm package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat: allow providing the fragment size filepath (from rnaseq preprocessing)

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): reduce max line length

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(ruff): mark unsorted imports as fixable

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore(uv): lock pyproject file

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove zfpkm related code as it is now an external package

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* revert: remove local pyproject requirement

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove zfpkm testing files

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: force sync lock

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* Purge `fast_bioservices` (#246)

* feat: use bioservice's `MyGeneInfo`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: convert from fast_bioservices to COMO's built-in pipeline

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* refactor: remove fast-bioservices from pyproject

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: use bioservices over fast_bioservices

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* feat(test): added unit tests for identifier conversion pipeline

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
* revert: use `raise <error>` instead of `log_and_raise` function

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

* chore: remove non-existant import

---------

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…elated tests

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>

# Conflicts:
#	.github/workflows/continuous_integration.yml
#	main/COMO.ipynb
#	main/como/create_context_specific_model.py
#	main/como/rnaseq.py
#	main/como/rnaseq_gen.py
#	main/como/rnaseq_preprocess.py
#	pyproject.toml
#	uv.lock
@JoshLoecker JoshLoecker marked this pull request as ready for review March 6, 2026 19:52
@JoshLoecker JoshLoecker merged commit ff9a09f into main Mar 6, 2026
3 checks passed
@JoshLoecker JoshLoecker deleted the v2 branch March 6, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant