-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @hjh0119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug where using Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug where sleep_level=2 in GRPO training could lead to gibberish outputs due to model weights not being fully restored after waking up. The fix ensures that weights are re-synchronized whenever sleep_level=2 is used. The changes correctly force a weight reload in grpo_trainer.py and rollout_mixin.py. I have one suggestion to improve the readability of a complex conditional statement.
| if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora: | ||
| self._move_full_model_to_vllm() | ||
| else: | ||
| self._move_adapter_to_vllm() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated condition correctly forces a full model sync when args.sleep_level == 2, which fixes the bug. However, the logic has become more complex and harder to read.
To improve maintainability, I suggest refactoring this if statement into a more readable structure by using a boolean variable to represent the condition for moving the full model. This makes the intent clearer without changing the logic.
| if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora: | |
| self._move_full_model_to_vllm() | |
| else: | |
| self._move_adapter_to_vllm() | |
| move_full_model = ( | |
| train_type == 'full' | |
| or not self.rollout_enable_lora | |
| or not self.base_sync_done | |
| or args.sleep_level == 2 | |
| ) | |
| if move_full_model: | |
| self._move_full_model_to_vllm() | |
| else: | |
| self._move_adapter_to_vllm() |
Fix vllm sleep_level 2 causes gibberish outputs
vllm sleep_level=2 will discard the model weights. When waking up the weights later, they may not be fully restored in some cases, leading to gibberish outputs.
There are two ways to recover from this:
Use the reload_weights method (see [Bug]: sleep level 2 causes gibberish outputs vllm-project/vllm#29341). However, this approach relies on the load_format specified during engine initialization for weight loading — which is not ideal and introduces additional overhead.
Perform a weight synchronization after waking up — this is the approach adopted in this PR.
Specifically, when vllm_enable_lora is enabled, the base model weights are discarded during sleep, so full model weights are always synchronized in this case.