Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Feb 11, 2026

✨ Description

Combine the batch and sequence dimensions in most of the model. This simplifies various sequence-based, and in particular removes the need for the sequence_first format.

Other changes

  • Fix various issues with distillation.
  • Fix the runner kwargs not propagating to namespaces.
  • Fix AuxiliaryLoss for eval mode.
  • Make the model head use the _debug util for returning logits.

Known issue: MTP has a different name for logits which causes incompatibility issues

@jlamypoirier jlamypoirier changed the title Token dim Merge the batch and sequence dimensions Feb 11, 2026
@jlamypoirier jlamypoirier marked this pull request as ready for review February 11, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant