-
Notifications
You must be signed in to change notification settings - Fork 467
Add qwen2 implementation #3113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add qwen2 implementation #3113
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
2c556cf to
7f84e0a
Compare
|
🤖 Hi @ChingTsai, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This Pull Request introduces the implementation of Qwen2 models, including new decoder layers, weight mappings, and configuration updates. The changes integrate Qwen2 into the existing MaxText framework, extending its model compatibility.
🔍 General Feedback
- The generalization of Qwen3 mappings and hook functions to a unified Qwen approach in
hf_shape.pyandparam_mapping.pyis a good practice, improving code reusability and maintainability. - New configuration files for Qwen2.5 models are well-structured and consistent with existing model configurations.
- Ensure consistent handling of attention biases across the model definition and weight mapping to prevent potential runtime issues.
|
Thanks for bringing up new models! We usually verify implementation using this script against the HF version. Please let us know if you meet any issues. |
|
cc @parambole who is working on Qwen3 for helping review PRs |
7f84e0a to
dcc4282
Compare
dcc4282 to
88f6034
Compare
Hi @RissyRan, I noticed that the 7b scanned checkpoint has a higher max KL divergence of 0.016245 (see logs). I've updated the threshold (0.015 -> 0.017) to allow this to pass, but please let me know if this level of divergence is a concern. |
Description
Changes
b/471703114
Tests
Logit Verification
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.