Remove DPO (Direct Preference Optimization) feature #3064

ecnal-cienet · 2026-02-02T17:36:42Z

Description

Summary

Remove all DPO (Direct Preference Optimization) related features from the codebase
This includes the DPO loss implementation, configuration parameters, data pipeline support, metrics, and tests

Changes

Deleted Files

src/maxtext/trainers/post_train/dpo/dpo_utils.py - core DPO loss implementation
src/MaxText/configs/dpo.yml - DPO configuration file
tests/end_to_end/tpu/test_dpo.sh - DPO end-to-end test

Modified Files

Training Pipeline:

src/MaxText/train.py - removed DPO loss integration, reference param handling, and DPO metrics
src/maxtext/utils/train_utils.py - removed DPO state restoration logic
src/MaxText/gradient_accumulation.py - removed extra_dpo_args parameter

Data Pipelines:

src/MaxText/input_pipeline/_grain_data_processing.py - removed dpo_preprocessing_pipeline and DPO branches
src/MaxText/input_pipeline/_tfds_data_processing.py - removed use_dpo parameter
src/MaxText/input_pipeline/_hf_data_processing.py - removed use_dpo parameter

Configuration:

src/MaxText/configs/base.yml - removed use_dpo, dpo_label_smoothing, dpo_beta parameters
src/MaxText/configs/types.py - removed DPO fields from FineTuning class

Utilities:

src/maxtext/common/metric_logger.py - removed DPO reward accuracy metrics
src/maxtext/utils/maxtext_utils.py - removed DPO FLOPs calculation
src/MaxText/__init__.py - removed dpo_utils export

Tests:

tests/unit/configs_test.py - removed dpo.yml from config validation tests
tests/unit/sft_data_processing_test.py - removed use_dpo argument

Other:

src/MaxText/experimental/rl/grpo_trainer.py - removed errant use_dpo check

Tests

Verified synthetic data training runs successfully with:

python3 src/MaxText/train.py src/MaxText/configs/base.yml run_name=nnx-train-test base_output_directory=gs://wanglance-maxtext/dpo-removal-test/after/gemma2-2b model_name=gemma2-2b dataset_type=synthetic steps=5

Log: view (no DPO related params at all)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

RissyRan · 2026-02-02T21:14:58Z

Is this removal indicating the integration with Tunix DPO?

ecnal-cienet · 2026-02-02T22:12:34Z

Hi Ranran,
No, this removal is not related to Tunix DPO integration. Alex identified that the DPO feature is no longer actively used in MaxText, so Xibin suggested removing it to simplify the codebase.

RissyRan · 2026-02-03T00:59:26Z

Thanks!

Hi @shralex I am wondering if we should keep this DPO feature in MaxText or integrate with Tunix version afterwards. @gagika and I were conducting repro work for Olmo3, and DPO is one step in post-training.

https://screenshot.googleplex.com/7WY6SAFFXgT3Wvz

codecov · 2026-02-11T00:36:13Z

Codecov Report

❌ Patch coverage is 60.00000% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...c/MaxText/input_pipeline/_grain_data_processing.py	25.00%	3 Missing ⚠️
src/MaxText/input_pipeline/_hf_data_processing.py	0.00%	0 Missing and 2 partials ⚠️
src/MaxText/train.py	66.66%	2 Missing ⚠️
...rc/MaxText/input_pipeline/_tfds_data_processing.py	83.33%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

A9isha

LGTM, just one small comment - thank you Charles!

A9isha · 2026-02-11T21:17:39Z

src/MaxText/gradient_accumulation.py

  Args:
      _loss_fn: The loss function to differentiate. Its signature is expected
-          to be: `(model, config, data, dropout_rng, params, *extra_args, is_train=True)`.
+          to be: `(model, config, data, dropout_rng, params, is_train=True)`.


why are we removing the *extra_args here?

ecnal-cienet force-pushed the feat/Remove-DPO-features branch from f13c153 to 67c601e Compare February 2, 2026 17:41

xibinliu marked this pull request as ready for review February 5, 2026 17:15

xibinliu requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jesselu-google, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025, vipannalla and xuefgu as code owners February 5, 2026 17:15

shralex approved these changes Feb 11, 2026

View reviewed changes

ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 67c601e to 0e6b29d Compare February 11, 2026 00:27

ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 0e6b29d to 5e37636 Compare February 11, 2026 19:58

Remove DPO (Direct Preference Optimization) feature

fd5ad68

ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 5e37636 to fd5ad68 Compare February 11, 2026 20:03

A9isha approved these changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DPO (Direct Preference Optimization) feature #3064

Remove DPO (Direct Preference Optimization) feature #3064

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 2, 2026

Uh oh!

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

A9isha left a comment

Uh oh!

A9isha Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove DPO (Direct Preference Optimization) feature #3064

Are you sure you want to change the base?

Remove DPO (Direct Preference Optimization) feature #3064

Conversation

ecnal-cienet commented Feb 2, 2026

Description

Summary

Changes

Deleted Files

Modified Files

Tests

Checklist

Uh oh!

RissyRan commented Feb 2, 2026

Uh oh!

ecnal-cienet commented Feb 2, 2026

Uh oh!

RissyRan commented Feb 3, 2026

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

A9isha left a comment

Choose a reason for hiding this comment

Uh oh!

A9isha Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 11, 2026 •

edited

Loading