Deacon index set operations by pmoris · Pull Request #10027 · nf-core/modules

pmoris · 2026-02-13T16:44:33Z

Goal

Update deacon index to allow set operations

See https://github.com/bede/deacon?tab=readme-ov-file#set-operations

Some considerations:

More flexibile/complex deacon/index module vs separate deacon/index_diff, deacon/index_union, etc. vs separate deacon/index_set_operations module? (EDIT: I've added an example of the separate modules approach now, see later comments.)

I had originally written a separate index_diff module for my own use in a custom pipeline. If the preferred approach is to split modules based on use case, I can split the current implementation out into multiple modules. On the other hand, if the preference of nf-core is to have every module provide a particular tool (rather than a specific subcommand), we can keep the current approach.

Roughly, it would look like this for the diff command if we were to split it out:
```
    input:
  tuple val(meta_fasta), path(fasta)
  tuple val(meta_index), path(index)

  output:
  tuple val(meta_index), path("*.idx"), emit: index
  path "versions.yml",                  emit: versions

  when:
  task.ext.when == null || task.ext.when

  script:
  def args = task.ext.args ?: ''
  def prefix = task.ext.prefix ?: "${meta_index.id}"
  """
  deacon \\
      index \\
      diff \\
      --threads ${task.cpus} \\
      $args \\
      $index \\
      $fasta > ${prefix}.diff.idx
```
For build, the main input is supplied via fasta (now renamed to reference). For the others, the expected input is two or more idx files (for intersect/union) or an idx + idx/fa file (for diff). Either we re-use the main fasta input of the build command for supplying the first index to operate on, and supply the rest via the set_genomes, or we can leave reference empty for the set operations and provide all inputs for the set operation via set_genomes. For the latter approach, they must be supplied in exact order (e.g., for diff I'd argue it makes sense to provide them separately so that it is clear what is subtracted from what), and it makes it more difficult to provide an idx + fa file. Ergo I opted for option 1.
Construction of output command:
Currently I use a ternary operator that constructs a different set of cli arguments depending on the selected subcommand (${subcommand == 'build' ? "${reference}" : "${reference} ${set_genomes}"} > ${prefix}.idx), but it could be simplified into
```
  deacon \\
      index \\
      build \\
      --threads ${task.cpus} \\
      $args \\
      $fasta 
      $set_genomes > ${prefix}.idx
```
because set_genomes would be empty for the build subcommand anyway.
There are a few extra index subcommands, like dump (creates .fa based on idx) and info, but since these result in different output types than the others, I've omitted them for now.
I still need to add new tests for the set operation commands. For this we need idx files. Are there any in nf-test's dataset yet or can we create them on the fly in chained tests somehow?

PR checklist

Closes #XXX

See https://github.com/bede/deacon?tab=readme-ov-file#set-operations - Adds two optional commands: - the first for choosing an index subcommand (build, union, intersect, diff -> defaults to build) - the second is a tuple of meta, genome, where genome is a path to .idx or fasta file(s) to use for the selected set operation (should be omitted for build). - Modifies the main reference input to accept both fasta files (for build) and .idx files (for set operations). - Various validation checks ensure that the correct input options are used based on the subcommand (e.g., diff accepts 1 fa or idx file, union/intersect can accept multiple idx files, build excepts nothing). - The selected subcommand is propagated in the output meta map.

Input and output names weren't updated yet after last changes to main.nf, and wrong indentation was used for meta_set.

Only deacon index build has multi-threading options. Now the --threads argument is omitted when selecting any other subcommand.

This is an alternative to combining all the set operation options into a single deacon/index module.

pmoris · 2026-02-16T09:43:55Z

I've added the alternative option of splitting the various deacon index subcommands into separate modules. No tests yet until we decide which option to use.

If we go with this option, we'd still need to decide how to construct the inputs.

In the end it comes down to how the upstream genomes are most likely to be constructed. If these are individual [meta, index] tuples in a channel, you can:

naively combine them via index_ch.collect(), and then the input to these modules would have to be a val (list of tuples).
map them into a single element with [meta, path(index_files): then the module only has a single meta.id to deal with.
map them into a tuple consisting of two lists [ meta_list, path(index_files)]`, which retains the individual meta info for each genome requires the input channel to go through a mapping operation.

I'm not sure which assumptions these modules should make in this regard.

Additionally, for the union and intersect modules, you could argue that it makes more sense to split their inputs into a main index + a set of indexes, just like diff, but here order of operations does not matter so I combined everything into a single input.

pmoris · 2026-02-19T08:08:34Z

See relevant discussion on Slack: https://nfcore.slack.com/archives/CE6SDBX2A/p1771355056637259

Conclusion: based on the nf-core guidelines (https://nf-co.re/docs/guidelines/components/modules#module-granularity), we'll use separate modules for the different set operations. Names should be changed to deacon/indexbuildunion.

For the existing index build module, we can either keep it as is (backwards compatibility + sane default) or rename it to indexbuild and deprecate the old one (https://nf-co.re/docs/guidelines/components/module_deprecation).

For now, I'll proceed with adding tests to the set modules and leave the existing index module as is.

pmoris added 7 commits February 13, 2026 17:36

Match stub prefix to main prefix

4058244

Update tests and snapshots to accept new inputs

c5119f7

Fix linting for meta.yml

23d61e6

Input and output names weren't updated yet after last changes to main.nf, and wrong indentation was used for meta_set.

Remove --threads option for set operations

d20b1e5

Only deacon index build has multi-threading options. Now the --threads argument is omitted when selecting any other subcommand.

fixup! Update deacon index to allow set operations

53bea2f

Add separate modules for deacon index set operations

0015065

This is an alternative to combining all the set operation options into a single deacon/index module.

pmoris force-pushed the deacon-index-diff branch from 3e9b4e7 to 0015065 Compare February 16, 2026 09:31

pmoris requested a review from vagkaratzas February 16, 2026 09:49

pmoris changed the title ~~Deacon index diff~~ Deacon index set operations Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Deacon index set operations#10027

Deacon index set operations#10027
pmoris wants to merge 7 commits intonf-core:masterfrom
pmoris:deacon-index-diff

pmoris commented Feb 13, 2026 •

edited

Loading

Uh oh!

pmoris commented Feb 16, 2026 •

edited

Loading

Uh oh!

pmoris commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

pmoris commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Some considerations:

PR checklist

Uh oh!

pmoris commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmoris commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pmoris commented Feb 13, 2026 •

edited

Loading

pmoris commented Feb 16, 2026 •

edited

Loading