-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used #30241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used #30241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a bug causing "Current vLLM config is not set." warnings by ensuring that the vLLM configuration is accessed only during the initialization stage. The approach of caching the force_use_trtllm_attention setting in FlashInferMetadataBuilder during initialization and then passing it down during the runtime stage is sound and effectively resolves the issue. The changes are well-contained and logical. I have no further comments as the implementation is solid.
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm_config already exists in the scope that you're using force_use_trtllm_attention, so it should just be accessed directly
|
Thanks, sorry I missed this |
…attention is used VLLM config is set only during initialization stage, not during runtime stage. Therefore, we should not call get_current_vllm_config() during dunrime stage. Instead, cache the config we want during initialization stage and reuse it during runtime stage. Signed-off-by: Po-Han Huang <[email protected]>
446ee64 to
0501d9d
Compare
|
@hmellor could you review again? thanks |
…attention is used (vllm-project#30241) Signed-off-by: Po-Han Huang <[email protected]>
Purpose
VLLM config is set only during initialization stage, not during runtime stage. Therefore, we should not call get_current_vllm_config() during dunrime stage. Instead, cache the config we want during initialization stage and reuse it during runtime stage.
This was caused by #26315 by @MatthewBonanni .
This fixes #30240
Test Plan
On H200:
Test Result
without all the warnings.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.