Skip to content

Conversation

@ecnal-cienet
Copy link
Collaborator

Description

Summary

  • Remove all DPO (Direct Preference Optimization) related features from the codebase
  • This includes the DPO loss implementation, configuration parameters, data pipeline support, metrics, and tests

Changes

Deleted Files

  • src/maxtext/trainers/post_train/dpo/dpo_utils.py - core DPO loss implementation
  • src/MaxText/configs/dpo.yml - DPO configuration file
  • tests/end_to_end/tpu/test_dpo.sh - DPO end-to-end test

Modified Files

Training Pipeline:

  • src/MaxText/train.py - removed DPO loss integration, reference param handling, and DPO metrics
  • src/maxtext/utils/train_utils.py - removed DPO state restoration logic
  • src/MaxText/gradient_accumulation.py - removed extra_dpo_args parameter

Data Pipelines:

  • src/MaxText/input_pipeline/_grain_data_processing.py - removed dpo_preprocessing_pipeline and DPO branches
  • src/MaxText/input_pipeline/_tfds_data_processing.py - removed use_dpo parameter
  • src/MaxText/input_pipeline/_hf_data_processing.py - removed use_dpo parameter

Configuration:

  • src/MaxText/configs/base.yml - removed use_dpo, dpo_label_smoothing, dpo_beta parameters
  • src/MaxText/configs/types.py - removed DPO fields from FineTuning class

Utilities:

  • src/maxtext/common/metric_logger.py - removed DPO reward accuracy metrics
  • src/maxtext/utils/maxtext_utils.py - removed DPO FLOPs calculation
  • src/MaxText/__init__.py - removed dpo_utils export

Tests:

  • tests/unit/configs_test.py - removed dpo.yml from config validation tests
  • tests/unit/sft_data_processing_test.py - removed use_dpo argument

Other:

  • src/MaxText/experimental/rl/grpo_trainer.py - removed errant use_dpo check

Tests

Verified synthetic data training runs successfully with:

python3 src/MaxText/train.py src/MaxText/configs/base.yml run_name=nnx-train-test base_output_directory=gs://wanglance-maxtext/dpo-removal-test/after/gemma2-2b model_name=gemma2-2b dataset_type=synthetic steps=5

Log: view (no DPO related params at all)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@ecnal-cienet ecnal-cienet force-pushed the feat/Remove-DPO-features branch from f13c153 to 67c601e Compare February 2, 2026 17:41
@RissyRan
Copy link
Collaborator

RissyRan commented Feb 2, 2026

Is this removal indicating the integration with Tunix DPO?

@ecnal-cienet
Copy link
Collaborator Author

Hi Ranran,
No, this removal is not related to Tunix DPO integration. Alex identified that the DPO feature is no longer actively used in MaxText, so Xibin suggested removing it to simplify the codebase.

@RissyRan
Copy link
Collaborator

RissyRan commented Feb 3, 2026

Thanks!

Hi @shralex I am wondering if we should keep this DPO feature in MaxText or integrate with Tunix version afterwards. @gagika and I were conducting repro work for Olmo3, and DPO is one step in post-training.

https://screenshot.googleplex.com/7WY6SAFFXgT3Wvz

@ecnal-cienet ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 67c601e to 0e6b29d Compare February 11, 2026 00:27
@codecov
Copy link

codecov bot commented Feb 11, 2026

@ecnal-cienet ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 0e6b29d to 5e37636 Compare February 11, 2026 19:58
@ecnal-cienet ecnal-cienet force-pushed the feat/Remove-DPO-features branch from 5e37636 to fd5ad68 Compare February 11, 2026 20:03
Copy link
Collaborator

@A9isha A9isha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one small comment - thank you Charles!

Args:
_loss_fn: The loss function to differentiate. Its signature is expected
to be: `(model, config, data, dropout_rng, params, *extra_args, is_train=True)`.
to be: `(model, config, data, dropout_rng, params, is_train=True)`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing the *extra_args here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants