Skip to content

Comments

Avoid redundant index steps for shared refs#9983

Open
pmoris wants to merge 3 commits intonf-core:masterfrom
pmoris:deacon-shared-index
Open

Avoid redundant index steps for shared refs#9983
pmoris wants to merge 3 commits intonf-core:masterfrom
pmoris:deacon-shared-index

Conversation

@pmoris
Copy link
Contributor

@pmoris pmoris commented Feb 12, 2026

This deacon subworkflow supports per-sample references, but the previous implementation ran the indexing step for each sample, regardless if samples shared the same reference.

This update allows the index step to only run once for each unique reference, rather than once per-sample. I.e. shared references among samples are only indexed once.

The output of the previous implementation is conserved, but this could still be changed for the index output (by adding fasta-based meta.id values rather than the sample metadata). See the TODO comment at the bottom.

TODO: I haven't updated the test snapshots yet. If we retain the original emit output for the index channel, it won't be needed. If we update the channel to contain the fasta meta.id, it will.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@pmoris
Copy link
Contributor Author

pmoris commented Feb 12, 2026

Waiting on #9982 before finishing this.

@pmoris pmoris force-pushed the deacon-shared-index branch from faa24d8 to 7f4dd4b Compare February 13, 2026 10:37
@pmoris
Copy link
Contributor Author

pmoris commented Feb 13, 2026

Rebased the branch on top of the new topic version changes in master.

@vagkaratzas : do you prefer the output of the index channel to remain unchanged or do you agree that its meta map should include the fasta filename as its meta.id?

@vagkaratzas
Copy link
Contributor

Rebased the branch on top of the new topic version changes in master.

@vagkaratzas : do you prefer the output of the index channel to remain unchanged or do you agree that its meta map should include the fasta filename as its meta.id?

I think the fasta name makes more sense, since it will also mean that it's unique per fasta.

@vagkaratzas
Copy link
Contributor

Rebased the branch on top of the new topic version changes in master.
@vagkaratzas : do you prefer the output of the index channel to remain unchanged or do you agree that its meta map should include the fasta filename as its meta.id?

I think the fasta name makes more sense, since it will also mean that it's unique per fasta.

BUT!!! Without the initial meta.id it wont be able to be joined with other stuff downstream in the pipeline if a user wants to do that. So we also need to keep the old meta.id as another attribute inside the meta.

@pmoris
Copy link
Contributor Author

pmoris commented Feb 13, 2026

Rebased the branch on top of the new topic version changes in master.
@vagkaratzas : do you prefer the output of the index channel to remain unchanged or do you agree that its meta map should include the fasta filename as its meta.id?

I think the fasta name makes more sense, since it will also mean that it's unique per fasta.

BUT!!! Without the initial meta.id it wont be able to be joined with other stuff downstream in the pipeline if a user wants to do that. So we also need to keep the old meta.id as another attribute inside the meta.

In that case, I propose to keep meta.id as is, and add a meta.index_id. Note that the snapshots will need to be updated for this approach too.

@pmoris pmoris force-pushed the deacon-shared-index branch from 06f87fb to 71431b3 Compare February 13, 2026 14:13
This deacon subworkflow supports per-sample
references, but the previous implementation
ran the indexing step for each sample,
regardless if samples shared the same reference.

This update allows the index step to only run
once for each unique reference, rather than once per-sample.
I.e. shared references among samples are only indexed once.

The output of the previous implementation is
conserved, but this could still be changed for
the index output (by adding fasta-based meta.id
values rather than the sample metadata).
This commit ensures that the sample-level metadata
associated with the sample reads is retained in the index
output channel. The baseName of the index - which itself
should be equal to the baseName of the used reference
fasta, or the chosen `ext.prefix` for the deacon/index module -
is added as an additional index_id key in the meta map.

Note that the index channel has the same length as the
number of input samples (i.e. it does not only contain
the unique indexes).
@pmoris pmoris force-pushed the deacon-shared-index branch from 71431b3 to a04fcc7 Compare February 16, 2026 10:52
@pmoris
Copy link
Contributor Author

pmoris commented Feb 16, 2026

Linting checks work locally with the dev version of nf-core. All tests pass locally as well. Just rebased on the latest version of master to see if that fixes the failing GH actions.

Snapshots needed to be updated because the
index output channel was updated with an additional
index_id key in its meta map.
@pmoris pmoris force-pushed the deacon-shared-index branch from a04fcc7 to da400bc Compare February 16, 2026 12:19
@pmoris
Copy link
Contributor Author

pmoris commented Feb 16, 2026

I forgot to update the snapshots of FASTQ_DECONTAMINATE_DEACON_HOSTILE, which relies on this subworkflow.

@pmoris pmoris requested a review from vagkaratzas February 16, 2026 12:59
Copy link
Contributor

@vagkaratzas vagkaratzas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I think this will work!.. but can you add an nf-test that gives the same fasta file twice as input to index and make sure it runs only once?

- "@Baksic-Ivan"
- "@Omer0191"
contributors:
= "@pmoris"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
= "@pmoris"
- "@pmoris"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants