Skip to content

Conversation

@hjh0119
Copy link
Collaborator

@hjh0119 hjh0119 commented Dec 12, 2025

Fix vllm sleep_level 2 causes gibberish outputs

vllm sleep_level=2 will discard the model weights. When waking up the weights later, they may not be fully restored in some cases, leading to gibberish outputs.

There are two ways to recover from this:

  1. Use the reload_weights method (see [Bug]: sleep level 2 causes gibberish outputs vllm-project/vllm#29341). However, this approach relies on the load_format specified during engine initialization for weight loading — which is not ideal and introduces additional overhead.

  2. Perform a weight synchronization after waking up — this is the approach adopted in this PR.

Specifically, when vllm_enable_lora is enabled, the base model weights are discarded during sleep, so full model weights are always synchronized in this case.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hjh0119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug where using vllm with sleep_level=2 could lead to corrupted or "gibberish" outputs due to incomplete restoration of model weights upon waking. The fix implements a robust weight synchronization mechanism, ensuring that the model's state is fully restored after waking up, thereby preventing data integrity issues and improving the reliability of vllm operations, particularly when LoRA is in use.

Highlights

  • Model Weight Synchronization: Ensures that model weights are explicitly reloaded and synchronized whenever the vllm engine wakes up from sleep_level=2, preventing potential data corruption or "gibberish outputs."
  • LoRA Compatibility: Specifically addresses scenarios where LoRA is enabled, guaranteeing that the full base model weights are synchronized when sleep_level=2 to maintain consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where sleep_level=2 in GRPO training could lead to gibberish outputs due to model weights not being fully restored after waking up. The fix ensures that weights are re-synchronized whenever sleep_level=2 is used. The changes correctly force a weight reload in grpo_trainer.py and rollout_mixin.py. I have one suggestion to improve the readability of a complex conditional statement.

Comment on lines +370 to 373
if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora:
self._move_full_model_to_vllm()
else:
self._move_adapter_to_vllm()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated condition correctly forces a full model sync when args.sleep_level == 2, which fixes the bug. However, the logic has become more complex and harder to read.

To improve maintainability, I suggest refactoring this if statement into a more readable structure by using a boolean variable to represent the condition for moving the full model. This makes the intent clearer without changing the logic.

Suggested change
if train_type == 'full' or (not self.base_sync_done or args.sleep_level == 2) or not self.rollout_enable_lora:
self._move_full_model_to_vllm()
else:
self._move_adapter_to_vllm()
move_full_model = (
train_type == 'full'
or not self.rollout_enable_lora
or not self.base_sync_done
or args.sleep_level == 2
)
if move_full_model:
self._move_full_model_to_vllm()
else:
self._move_adapter_to_vllm()

@hjh0119 hjh0119 merged commit 2dff717 into modelscope:main Dec 12, 2025
3 checks passed
@hjh0119 hjh0119 deleted the fix-sleep-level2 branch December 12, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants