[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017

hjh0119 · 2025-12-12T03:38:07Z

Fix vllm sleep_level 2 causes gibberish outputs

vllm sleep_level=2 will discard the model weights. When waking up the weights later, they may not be fully restored in some cases, leading to gibberish outputs.

There are two ways to recover from this:

Use the reload_weights method (see [Bug]: sleep level 2 causes gibberish outputs vllm-project/vllm#29341). However, this approach relies on the load_format specified during engine initialization for weight loading — which is not ideal and introduces additional overhead.
Perform a weight synchronization after waking up — this is the approach adopted in this PR.

Specifically, when vllm_enable_lora is enabled, the base model weights are discarded during sleep, so full model weights are always synchronized in this case.

gemini-code-assist · 2025-12-12T03:38:23Z

Summary of Changes

Hello @hjh0119, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug where using vllm with sleep_level=2 could lead to corrupted or "gibberish" outputs due to incomplete restoration of model weights upon waking. The fix implements a robust weight synchronization mechanism, ensuring that the model's state is fully restored after waking up, thereby preventing data integrity issues and improving the reliability of vllm operations, particularly when LoRA is in use.

Highlights

Model Weight Synchronization: Ensures that model weights are explicitly reloaded and synchronized whenever the vllm engine wakes up from sleep_level=2, preventing potential data corruption or "gibberish outputs."
LoRA Compatibility: Specifically addresses scenarios where LoRA is enabled, guaranteeing that the full base model weights are synchronized when sleep_level=2 to maintain consistency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes a bug where sleep_level=2 in GRPO training could lead to gibberish outputs due to model weights not being fully restored after waking up. The fix ensures that weights are re-synchronized whenever sleep_level=2 is used. The changes correctly force a weight reload in grpo_trainer.py and rollout_mixin.py. I have one suggestion to improve the readability of a complex conditional statement.

gemini-code-assist · 2025-12-12T03:46:07Z

swift/trainers/rlhf_trainer/rollout_mixin.py

+        if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora:
            self._move_full_model_to_vllm()
        else:
            self._move_adapter_to_vllm()


The updated condition correctly forces a full model sync when args.sleep_level == 2, which fixes the bug. However, the logic has become more complex and harder to read.

To improve maintainability, I suggest refactoring this if statement into a more readable structure by using a boolean variable to represent the condition for moving the full model. This makes the intent clearer without changing the logic.

Suggested change

if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora:

self._move_full_model_to_vllm()

else:

self._move_adapter_to_vllm()

move_full_model = (

train_type == 'full'

or not self.rollout_enable_lora

or not self.base_sync_done

or args.sleep_level == 2

)

if move_full_model:

self._move_full_model_to_vllm()

else:

self._move_adapter_to_vllm()

update

53b274d

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

update args

ee44b2f

Jintao-Huang approved these changes Dec 12, 2025

View reviewed changes

hjh0119 merged commit 2dff717 into modelscope:main Dec 12, 2025
3 checks passed

hjh0119 deleted the fix-sleep-level2 branch December 12, 2025 07:02

Jintao-Huang pushed a commit that referenced this pull request Dec 13, 2025

[bugfix] fix grpo sleep_level 2 causes gibberish outputs (#7017)

bf7b2c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017

[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017

Uh oh!

hjh0119 commented Dec 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017

[bugfix] fix grpo sleep_level 2 causes gibberish outputs #7017

Uh oh!

Conversation

hjh0119 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix vllm sleep_level 2 causes gibberish outputs

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjh0119 commented Dec 12, 2025 •

edited

Loading