Update transformers to v5.x, unsloth, and add MoE LoRA conversion#576
Open
Update transformers to v5.x, unsloth, and add MoE LoRA conversion#576
Conversation
Update core dependencies for transformers v5 ecosystem: - transformers: >=4.55.2,<=4.57.3 → >=5.1.0 - unsloth: 2025.12.9 → 2026.2.1 - unsloth-zoo: 2025.12.7 → 2026.2.1 (+ updated VCS pin) - trl: 0.20.0 → >=0.28.0 - peft: >=0.14.0 → >=0.18.0 (required by transformers v5) Fix transformers v5 breaking changes: - Replace removed dummy_pt_objects import with direct transformers import - Update masking_utils patch return type (now returns 5 values) - Remove deprecated TrainerArgs fields (overwrite_output_dir, jit_mode_eval, mp_parameters, logging_dir, fp16_backend, push_to_hub_token/model_id/organization) Add MoE LoRA adapter conversion utility for vLLM compatibility: - Unsloth + transformers v5 saves MoE LoRA as fused 2D tensors - vLLM expects per-expert format - Auto-detect and convert after checkpoint save Closes #575 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unsloth 2026.2.1 requires trl>0.18.2,!=0.19.0,<=0.24.0. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unsloth 2026.2.1's pyproject.toml has overly strict constraints (transformers<=4.57.6, trl<=0.24.0) but the February-2026 release notes confirm v5.1.0 + trl 0.27.1 work well. Use uv override-dependencies to allow the upgrade. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Transformers v5 removed `warnings_issued` from PreTrainedModel, but Unsloth's GRPOTrainer still accesses it during initialization. Add it as an empty dict on the PEFT model before creating the trainer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ers v5 Transformers v5 changed apply_chat_template to return BatchEncoding by default when tokenize=True. Add return_dict=False to all calls that expect list[int] return type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bradhilton/unsloth-zoo fork is at version 2025.8.4 which is missing modules needed by unsloth 2026.2.1 (e.g. unsloth_zoo.device_type). Switch to the official PyPI release which matches unsloth 2026.2.1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These changes were not needed for the transformers v5 upgrade: - backend.vcs.txt: not used for installation (pyproject.toml handles deps) - model.py TrainerArgs: TypedDict fields don't cause runtime errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove fields that transformers v5 dropped from TrainingArguments: overwrite_output_dir, logging_dir, jit_mode_eval, half_precision_backend, tpu_num_cores, past_index, fp16_backend, push_to_hub_model_id, push_to_hub_organization, push_to_hub_token, mp_parameters, torchdynamo, ray_scope. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
trl was originally pinned to 0.20.0. No reason to loosen it — 0.20.0 already satisfies unsloth's trl<=0.24.0 constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep both the cast(list[int], ...) wrapper from main and the return_dict=False parameter needed for transformers v5. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of adding return_dict=False to every call site, patch PreTrainedTokenizerBase.apply_chat_template once in patches.py to default return_dict=False. This restores transformers v4 behavior (returning list[int]) globally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The attribute wasn't removed in transformers v5 — Unsloth's model patching can leave the PEFT model without it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apply_chat_templatereturn types, removedwarnings_issued, deprecated TrainerArgs fields)Changes
Dependencies (
pyproject.toml,requirements/backend.vcs.txt)transformers>=5.1.0(was>=4.55.2,<=4.57.3)unsloth==2026.2.1(was2025.12.9)unsloth-zoo==2026.2.1via PyPI (was VCS pin to bradhilton fork)peft>=0.18.0(was>=0.14.0)override-dependenciesfor transformers and trl to bypass unsloth's overly strict PyPI constraints (confirmed working per unsloth Feb 2026 release notes)Code fixes
src/art/unsloth/service.py: Fix import path (GenerationMixin,PreTrainedModelnow directly fromtransformersinstead oftransformers.utils.dummy_pt_objects). Addwarnings_issuedcompat shim. Integrateconvert_checkpoint_if_neededcalls.src/art/preprocessing/tokenize.py: Addreturn_dict=Falsetoapply_chat_templatecalls (v5 default changed fromlist[int]toBatchEncoding)src/art/transformers/patches.py: Update return type for v5's_preprocess_mask_arguments(now returns 5 values instead of 4)src/art/dev/model.py: Remove deprecatedTrainerArgsfieldssrc/art/tinker/server.py: Addreturn_dict=Falsetoapply_chat_templateNew files
src/art/utils/convert_moe_lora.py: Converts fused MoE LoRA adapters (produced by unsloth + transformers v5) to per-expert format for vLLM compatibility. Runs automatically after checkpoint save; no-op for non-MoE models.Test results
Tested on H200 GPU cluster with:
Test: 3-step yes-no-maybe RL training with Qwen2.5-7B-Instruct (LocalBackend)
Full pipeline verified: model loading → inference (vLLM) → rollouts → tokenization → training (unsloth) → checkpoint save → LoRA swap → resume inference.
Test plan
Closes #575
🤖 Generated with Claude Code