Skip to content

Comments

New module: lsa/cosine (calculate cosine similarity)#10002

Open
miguelrosell wants to merge 16 commits intonf-core:masterfrom
miguelrosell:nf-core-cosimflow
Open

New module: lsa/cosine (calculate cosine similarity)#10002
miguelrosell wants to merge 16 commits intonf-core:masterfrom
miguelrosell:nf-core-cosimflow

Conversation

@miguelrosell
Copy link
Member

@miguelrosell miguelrosell commented Feb 12, 2026

Hi @ewels and @pinin4fjords!

Following up on our chat in the proposals channel, here’s the PR for the cosimflow module. It’s a tool for calculating pairwise cosine similarity from expression matrices (outputting both the matrix and a heatmap).

I think it could be a handy addition for the rnaseq pipeline, maybe as a downstream QC step to see how samples cluster together beyond the usual PCA.

PR checklist:

Closes: xxx

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions.
  • If necessary, include test data in your PR (Created a small dummy CSV on-the-fly in the test script).
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label (process_medium).
  • Use BioConda and BioContainers.

Tests performed locally:

  • nf-core modules test cosimflow --profile conda

Quick note: I ended up putting the R script directly inside the main.nf script block. It was the cleanest way I found to handle the dollar signs and variable escaping for R/Nextflow without everything blowing up.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

There's a few things to do to make this fit in with repository standards, and I think there's some complexity to be stripped.

In particular, since there's no actual software called 'cosimflow', I think this needs to be named like lsa/cosine in my view, since it's principally a wrapper around that function. Or possibly it could be a script under custom/.

I've simplified the script a bit to apply some of my suggestions for you:

pr10002_cosimflow_template.R.zip

)

# AQUÍ PASAMOS LOS ARGUMENTOS DE NEXTFLOW DIRECTAMENTE
args_list <- c("--input", "$expression_matrix", "--out_prefix", "$prefix")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script constructs c("--input", "$expression_matrix", "--out_prefix", "$prefix") then parses them back into opt$input and opt$out_prefix - getting the same values it started with. Only task.ext.args actually needs parsing. Feed $args directly to optparse for configurable options (--method, --min_gene_mean), and use Nextflow template variables for input/prefix. See attached template.

assert process.success

// 4. Verificamos que se generen los archivos
assert process.out.matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the outputs are deterministic, they should be snapshotted, if not you should snapshot file names at least, see e.g. how deseq2/differential module does it:

                { assert snapshot(
                        process.out.results,
                        ...
                        file(process.out.dispersion_plot_png[0][1]).name,
                        ...
                    ).match() },

@miguelrosell miguelrosell changed the title New module: cosimflow for cosine similarity New module: lsa/cosine (calculate cosine similarity) Feb 13, 2026
@miguelrosell
Copy link
Member Author

@pinin4fjords Thank you for all of corrections, comments and suggestions, I really appreciate it, for your patience and the detailed help!!

I hope I have applied everything you requested:

-Renamed it to lsa/cosine.

-I switched the input to CSV-only (removed readxl dependency) as suggested.

-I updated meta.yml (license/maintainers) and fixed the variable scope in main.nf so the R template works correctly.

  • And the tests and stubs are now locally passing, with snapshots included.

  • And I have fixed the non-English issue :)

If there's anything that I have missed or that needs a second look at, I'm all ears!

Again, thank you!

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better!

@miguelrosell
Copy link
Member Author

Hi @pinin4fjords!

I applied both changes:

  • I took out readxl and dplyr from environment.yml
  • I switched to dynamic versions in the stub.

Like always, I'm always here and willing to change anything I might have missed if necessary.
Thank you!! :)))

@miguelrosell
Copy link
Member Author

Hi @pinin4fjords! I just updated the branch with master to resolve the out-of-date status.

Looks like the workflows are currently awaiting approval to run. Is the next step approving if the checks pass or am i missing something?

Have a good day and thank youu!!

@pinin4fjords
Copy link
Member

Looks like the workflows are currently awaiting approval to run. Is the next step approving if the checks pass or am i missing something?

If you go on the nf-core Slack to the #github-invitations channel you can get yourself added to the org so that workflows will run for you!

@miguelrosell
Copy link
Member Author

Hello, @pinin4fjords! I managed to fix all the schema requirements for the meta.yml and aligned the process name as well. I think all linting and tests pass with 0 errors now!
Ready to merge whenever you are.

Have a good Friday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants