-
Notifications
You must be signed in to change notification settings - Fork 467
Remove DPO (Direct Preference Optimization) feature #3064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f13c153 to
67c601e
Compare
|
Is this removal indicating the integration with Tunix DPO? |
|
Hi Ranran, |
67c601e to
0e6b29d
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
0e6b29d to
5e37636
Compare
5e37636 to
fd5ad68
Compare
A9isha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one small comment - thank you Charles!
| Args: | ||
| _loss_fn: The loss function to differentiate. Its signature is expected | ||
| to be: `(model, config, data, dropout_rng, params, *extra_args, is_train=True)`. | ||
| to be: `(model, config, data, dropout_rng, params, is_train=True)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing the *extra_args here?
Description
Summary
Changes
Deleted Files
src/maxtext/trainers/post_train/dpo/dpo_utils.py- core DPO loss implementationsrc/MaxText/configs/dpo.yml- DPO configuration filetests/end_to_end/tpu/test_dpo.sh- DPO end-to-end testModified Files
Training Pipeline:
src/MaxText/train.py- removed DPO loss integration, reference param handling, and DPO metricssrc/maxtext/utils/train_utils.py- removed DPO state restoration logicsrc/MaxText/gradient_accumulation.py- removedextra_dpo_argsparameterData Pipelines:
src/MaxText/input_pipeline/_grain_data_processing.py- removeddpo_preprocessing_pipelineand DPO branchessrc/MaxText/input_pipeline/_tfds_data_processing.py- removeduse_dpoparametersrc/MaxText/input_pipeline/_hf_data_processing.py- removeduse_dpoparameterConfiguration:
src/MaxText/configs/base.yml- removeduse_dpo,dpo_label_smoothing,dpo_betaparameterssrc/MaxText/configs/types.py- removed DPO fields fromFineTuningclassUtilities:
src/maxtext/common/metric_logger.py- removed DPO reward accuracy metricssrc/maxtext/utils/maxtext_utils.py- removed DPO FLOPs calculationsrc/MaxText/__init__.py- removeddpo_utilsexportTests:
tests/unit/configs_test.py- removed dpo.yml from config validation teststests/unit/sft_data_processing_test.py- removeduse_dpoargumentOther:
src/MaxText/experimental/rl/grpo_trainer.py- removed errantuse_dpocheckTests
Verified synthetic data training runs successfully with:
Log: view (no DPO related params at all)
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.