Skip to content

Conversation

@Luodian
Copy link
Contributor

@Luodian Luodian commented Jul 28, 2025

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Thank you for your contributions!

Summary by CodeRabbit

  • New Features

    • Improved configuration flexibility and environment variable support for model and parallelism settings.
    • Enhanced prompt evaluation robustness and error handling for MIA Bench tasks.
  • Bug Fixes

    • Refined handling of generation parameters to prevent conflicts and warnings during text generation.
  • Refactor

    • Migrated MathVerse evaluation to use the official OpenAI SDK, streamlining API usage and error management.
    • Simplified and strengthened result processing and aggregation logic for MathVerse and MathVista tasks.
  • Chores

    • Removed or simplified sampling and generation parameters across multiple task configurations.
    • Increased maximum token output limits in several evaluation tasks for longer generation capabilities.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Walkthrough

This update refactors model and evaluation logic, improves configuration management, and enhances robustness across code and YAML configurations. Key changes include parameter renaming and environment variable usage, migration to the official OpenAI SDK, improved error handling, and significant simplification of generation parameter configurations in multiple YAML files.

Changes

Cohort / File(s) Change Summary
VLLM Model Refactor & Configurability
lmms_eval/models/vllm.py, examples/models/vllm_qwen2vl.sh
Renamed constructor and argument keys from model_version to model, added data_parallel_size, removed threads, improved environment variable handling, adjusted client initialization, and reordered message content. Updated script argument key to match.
LLaVA Model Generation Logic
lmms_eval/models/llava_onevision.py
Refined generation parameter handling: set sampling parameters only if do_sample is true, remove conflicting keys, and always remove image_aspect_ratio before generation.
OpenAI SDK Migration for MathVerse
lmms_eval/tasks/mathverse/mathverse_evals.py, lmms_eval/tasks/mathverse/utils.py
Replaced manual HTTP requests with official OpenAI/Azure SDK clients, simplified initialization, updated response handling, removed file output, and streamlined error management.
YAML Generation Parameter Simplification
lmms_eval/tasks/k12/k12.yaml, lmms_eval/tasks/mathverse/mathverse_testmini.yaml, lmms_eval/tasks/mathvision/mathvision_reason_test.yaml, lmms_eval/tasks/mathvision/mathvision_reason_testmini.yaml, lmms_eval/tasks/mathvision/mathvision_test.yaml, lmms_eval/tasks/mathvision/mathvision_testmini.yaml, lmms_eval/tasks/mathvista/mathvista_test.yaml, lmms_eval/tasks/mathvista/mathvista_testmini_cot.yaml, lmms_eval/tasks/mathvista/mathvista_testmini_format.yaml, lmms_eval/tasks/mathvista/mathvista_testmini_solution.yaml, lmms_eval/tasks/mmmu/mmmu_val_thinking.yaml
Removed most sampling and beam search parameters (e.g., temperature, top_p, num_beams, do_sample, repetition_penalty); increased max_new_tokens in several files.
MathVerse/MathVista Result Output Simplification
lmms_eval/tasks/mathverse/utils.py, lmms_eval/tasks/mathvista/utils.py
Removed saving of intermediate/final results and scores to files; now only returns evaluation results.
MIA Bench Robustness and Logging
lmms_eval/tasks/mia_bench/utils.py
Rewrote score processing and results functions for robust parsing, validation, and error handling, with improved logging and fallback behavior.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MathVerseEvaluator
    participant OpenAI/Azure SDK

    User->>MathVerseEvaluator: get_chat_response(prompt, ...)
    MathVerseEvaluator->>OpenAI/Azure SDK: chat.completions.create(...)
    OpenAI/Azure SDK-->>MathVerseEvaluator: response (choices[0].message.content)
    MathVerseEvaluator-->>User: parsed response
Loading
sequenceDiagram
    participant User
    participant VLLM_Model

    User->>VLLM_Model: __init__(model=..., tensor_parallel_size=..., data_parallel_size=...)
    VLLM_Model->>VLLM_Model: Configure via env vars, check parallel constraints
    VLLM_Model->>VLLM_Model: Initialize client after accelerator setup
    User->>VLLM_Model: generate_until(...)
    VLLM_Model->>VLLM_Model: Build messages (images first, then text)
    VLLM_Model->>User: Generation result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

Oh, what a hop through the code we take,
Parameters renamed for clarity’s sake!
YAMLs are lighter, configs now neat,
SDKs bring speed with a modern beat.
Robustness and logging, a rabbit’s delight—
This patch is a carrot, crisp and bright!
🥕🐇

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62c9a6a and 8e68a18.

📒 Files selected for processing (1)
  • lmms_eval/models/vllm.py (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lmms_eval/models/vllm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch stable/0d3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Luodian Luodian marked this pull request as ready for review July 28, 2025 13:20
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot free trial expires on July 29, 2025
Learn more in the Cursor dashboard.

{"role": "user", "content": prompt},
]
payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n": n}
payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: API Type Mismatch Causes Undefined Attribute

The gpt_model attribute is no longer set in the __init__ method and is only defined at the class level for the Azure API type. Consequently, when using the OpenAI API, self.gpt_model is undefined, leading to an AttributeError when accessed in the get_chat_response method.

Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
lmms_eval/tasks/mathverse/mathverse_evals.py (1)

85-92: Bug: self.gpt_model is undefined for OpenAI API type.

When using the OpenAI API type, self.gpt_model is not defined, which will cause an AttributeError when accessed in the get_chat_response method at line 101. The gpt_model attribute is only set at the class level for the Azure API type.

Apply this fix to define gpt_model for both API types:

 if API_TYPE == "openai":
     API_URL = os.getenv("OPENAI_API_URL", "https://api.openai.com/v1/chat/completions")
     API_KEY = os.getenv("OPENAI_API_KEY", "YOUR_API_KEY")
     headers = {
         "Authorization": f"Bearer {API_KEY}",
         "Content-Type": "application/json",
     }
     client = OpenAI(api_key=API_KEY, base_url=API_URL.rstrip("chat/completions"))
+    gpt_model = os.getenv("MODEL_VERSION", "gpt-4o-2024-11-20")
🧹 Nitpick comments (7)
lmms_eval/tasks/mathvista/mathvista_testmini_cot.yaml (1)

11-13: Same 16 384-token cap – replicate the capacity check

Replicating the concern from the other YAMLs: ensure the models used for the cot split accept such long completions; otherwise lower the value or add explicit early-stop criteria.

lmms_eval/tasks/mathvista/mathvista_testmini_format.yaml (1)

10-12: High max_new_tokens without explicit do_sample: false

With sampling keys removed, vLLM defaults to do_sample=False, but being explicit avoids future surprises if defaults change.

 generation_kwargs:
+  do_sample: false
   max_new_tokens: 16384
lmms_eval/tasks/mathvision/mathvision_testmini.yaml (1)

10-12: Large generation window – revisit for mini split

The testmini subset is meant for quick sanity checks; a 16 k token ceiling defeats the purpose.
Recommend keeping the shorter limit (e.g. 1 024) here to keep CI runs lightweight.

lmms_eval/tasks/mathverse/utils.py (1)

82-88: File saving removal streamlines evaluation workflow.

Removing the file saving operations (raw results, evaluated results dictionary, and scores JSON files) streamlines the evaluation process and reduces potential I/O bottlenecks. However, consider the impact on debugging and result inspection capabilities.

If result persistence is needed for debugging or analysis, consider adding an optional parameter to control file saving:

def mathverse_aggregate_results_eval(results, args, *, calculate_gain=False, random_scores=None, save_results=False):
    split_flag = results[0]["metadata"]["split"]
    problem_version = results[0]["problem_version"].lower().replace(" ", "_")
    results_dict, scores = mathverse_evaluator.eval_results(results, config)
    
    if save_results:
        # Optional file saving logic here
        pass
    
    if scores["average"]["accuracy"] == 0:
        return None
    return scores["average"]["accuracy"]
lmms_eval/models/llava_onevision.py (2)

543-543: Fix code style issue.

Static analysis correctly identifies a minor style issue.

-            if "image_aspect_ratio" in gen_kwargs.keys():
+            if "image_aspect_ratio" in gen_kwargs:

735-741: Consider consistency with the other generate method.

The parameter handling here differs slightly from generate_until method. In generate_until, sampling parameters are only removed when do_sample=False, but here they're explicitly set when do_sample=True and then potentially removed later.

For consistency, consider using the same pattern as generate_until where sampling parameters are conditionally handled:

-                # Only set temperature and top_p if do_sample is True
-                if gen_kwargs.get("do_sample", False):
-                    if "temperature" not in gen_kwargs:
-                        gen_kwargs["temperature"] = 1.0  # Default temperature for sampling
-                    if "top_p" not in gen_kwargs:
-                        gen_kwargs["top_p"] = 1.0  # Default top_p for sampling
+                # When do_sample=False, remove sampling-related parameters to avoid warnings
+                if not gen_kwargs.get("do_sample", False):
+                    gen_kwargs.pop("temperature", None)
+                    gen_kwargs.pop("top_p", None)
+                    gen_kwargs.pop("top_k", None)
lmms_eval/tasks/mia_bench/utils.py (1)

153-257: Excellent robustification of the score parsing logic.

The rewritten process_rawscore function significantly improves error handling and robustness with:

  • Comprehensive input validation
  • Well-designed regex patterns for flexible score extraction
  • Score clamping to ensure valid ranges
  • Detailed logging for debugging
  • Graceful fallback mechanisms

Two minor improvements based on static analysis:

# Line 226: Rename unused loop variable
-        for i, component in enumerate(component_type):
+        for component in component_type:

# Line 234: Remove unnecessary .keys()
-                score_dict["total_score"] = sum(v for k, v in score_dict.items() if k != "total_score") / len([k for k in score_dict.keys() if k != "total_score"])
+                score_dict["total_score"] = sum(v for k, v in score_dict.items() if k != "total_score") / len([k for k in score_dict if k != "total_score"])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8895505 and 62c9a6a.

📒 Files selected for processing (18)
  • examples/models/vllm_qwen2vl.sh (1 hunks)
  • lmms_eval/models/llava_onevision.py (3 hunks)
  • lmms_eval/models/vllm.py (9 hunks)
  • lmms_eval/tasks/k12/k12.yaml (0 hunks)
  • lmms_eval/tasks/mathverse/mathverse_evals.py (2 hunks)
  • lmms_eval/tasks/mathverse/mathverse_testmini.yaml (1 hunks)
  • lmms_eval/tasks/mathverse/utils.py (1 hunks)
  • lmms_eval/tasks/mathvision/mathvision_reason_test.yaml (0 hunks)
  • lmms_eval/tasks/mathvision/mathvision_reason_testmini.yaml (0 hunks)
  • lmms_eval/tasks/mathvision/mathvision_test.yaml (1 hunks)
  • lmms_eval/tasks/mathvision/mathvision_testmini.yaml (1 hunks)
  • lmms_eval/tasks/mathvista/mathvista_test.yaml (1 hunks)
  • lmms_eval/tasks/mathvista/mathvista_testmini_cot.yaml (1 hunks)
  • lmms_eval/tasks/mathvista/mathvista_testmini_format.yaml (1 hunks)
  • lmms_eval/tasks/mathvista/mathvista_testmini_solution.yaml (1 hunks)
  • lmms_eval/tasks/mathvista/utils.py (0 hunks)
  • lmms_eval/tasks/mia_bench/utils.py (2 hunks)
  • lmms_eval/tasks/mmmu/mmmu_val_thinking.yaml (0 hunks)
💤 Files with no reviewable changes (5)
  • lmms_eval/tasks/mathvision/mathvision_reason_testmini.yaml
  • lmms_eval/tasks/mathvista/utils.py
  • lmms_eval/tasks/mathvision/mathvision_reason_test.yaml
  • lmms_eval/tasks/k12/k12.yaml
  • lmms_eval/tasks/mmmu/mmmu_val_thinking.yaml
🧰 Additional context used
🧬 Code Graph Analysis (3)
lmms_eval/tasks/mathverse/mathverse_evals.py (1)
lmms_eval/tasks/mathvista/mathvista_evals.py (1)
  • get_chat_response (183-218)
lmms_eval/tasks/mathverse/utils.py (1)
lmms_eval/tasks/mathverse/mathverse_evals.py (1)
  • MathVerseEvaluator (76-300)
lmms_eval/models/vllm.py (1)
lmms_eval/models/qwen2_5_vl.py (4)
  • model (139-144)
  • device (159-160)
  • batch_size (155-156)
  • flatten (173-178)
🪛 Ruff (0.12.2)
lmms_eval/tasks/mathverse/mathverse_evals.py

85-85: Using .strip() with multi-character strings is misleading

(B005)

lmms_eval/tasks/mia_bench/utils.py

226-226: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


234-234: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

lmms_eval/models/llava_onevision.py

543-543: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (18)
lmms_eval/tasks/mathvision/mathvision_test.yaml (1)

10-12: Token limit raised to 16 384 → verify model/context capacity & evaluation cost

max_new_tokens jumps from 1 024 to 16 384.
Many open-weights (e.g. Llama 2/3, Phi-3, Gemma, most vision-LLMs) top out at 8 192 or less. vLLM aborts or silently truncates when the request exceeds the model’s context window, and such a long limit can explode GPU memory and evaluation time if the stop sequence is not hit.

Please double-check:

  1. The models targeted by this task actually support ≥ 16 384 tokens.
  2. The evaluation harness enforces earlier stopping (e.g. via generate_until) so we don’t pay for pathological generations.
  3. Runtime / cost budgets remain acceptable.

If the larger limit is only a safeguard, consider a safer cap (e.g. 8 192) or dynamically deriving it from model.config.max_position_embeddings.

lmms_eval/tasks/mathvista/mathvista_test.yaml (1)

10-12: Potential OOM / latency spike due to 16 k max_new_tokens

Full-set MathVista evaluation is already compute-heavy. Unless your models demonstrably need >1 k output tokens, the new ceiling will:

  • Increase evaluation latency linearly.
  • Risk CUDA OOM on 24–48 GB GPUs.

Consider profiling with 1 k vs 16 k limits before merging.

lmms_eval/tasks/mathverse/mathverse_testmini.yaml (2)

23-23: LGTM: Configuration simplification improves reproducibility.

The simplification of generation parameters (removing stochastic sampling controls as mentioned in the AI summary) aligns well with the PR objective of improving result reproducibility. Deterministic generation settings will produce more consistent results across runs.


30-30: Appropriate model-specific configuration.

Adding image_aspect_ratio: original to the llava model-specific generation kwargs is appropriate for this visual reasoning task, ensuring images are processed with their original aspect ratios.

lmms_eval/tasks/mathvista/mathvista_testmini_solution.yaml (1)

11-11: Significant increase in token limit is appropriate for solution tasks.

Increasing max_new_tokens from 1024 to 16384 is well-suited for the "solution" shot type, as mathematical solutions often require detailed step-by-step explanations that can be quite lengthy. This change, combined with the removal of sampling parameters (as noted in the AI summary), supports the PR objective of improving reproducibility while ensuring adequate capacity for comprehensive solutions.

examples/models/vllm_qwen2vl.sh (1)

13-13: Parameter rename improves consistency.

The change from model_version to model in the model arguments is more intuitive and aligns with the refactoring mentioned in the AI summary. This standardization improves the API consistency across the codebase.

lmms_eval/tasks/mathverse/utils.py (1)

22-22: Evaluator initialization simplified appropriately.

The simplified initialization of mathverse_evaluator aligns with the migration to the official OpenAI SDK. Based on the relevant code snippets, the MathVerseEvaluator class now handles API client initialization internally using environment variables, which is a cleaner approach.

lmms_eval/models/llava_onevision.py (2)

539-551: Improved generation parameter handling prevents warnings.

The refined approach to handling generation kwargs is well-designed:

  • Unconditionally setting max_new_tokens ensures a reasonable default
  • Only removing sampling parameters when do_sample=False prevents unnecessary warnings
  • This aligns with the PR's goal of improving reproducibility by handling generation parameters more robustly

764-767: Consistent parameter cleanup approach.

The parameter cleanup logic here matches the approach used in the generate_until method, which is good for consistency.

lmms_eval/tasks/mia_bench/utils.py (3)

47-47: Good use of environment variable for model version configuration.

The change to source GPT_EVAL_MODEL_NAME from the MODEL_VERSION environment variable with a sensible default improves configurability and aligns with the broader PR's standardization of model version references.


149-149: Excellent improvement to prompt clarity for consistent parsing.

The explicit instruction to output scores in an exact format without markdown or asterisks will significantly improve parsing reliability downstream. This change directly addresses potential parsing issues.


260-342: Excellent defensive programming implementation.

The rewritten mia_bench_process_results function demonstrates best practices in error handling:

  • Comprehensive input validation
  • Error handling at every potential failure point
  • Consistent logging for debugging
  • Graceful degradation with fallback zero scores
  • Clear structure that aids maintainability

This robust implementation ensures the evaluation pipeline won't crash due to unexpected inputs or API failures.

lmms_eval/tasks/mathverse/mathverse_evals.py (1)

97-132: Good migration to OpenAI SDK with preserved error handling.

The refactored get_chat_response method successfully migrates to the official SDK while preserving important functionality:

  • Retry logic with patience parameter
  • Rate limit handling
  • Automatic prompt size reduction on length errors
  • Proper response extraction from the SDK response object

Note: This implementation will work correctly once the self.gpt_model AttributeError is fixed.

lmms_eval/models/vllm.py (5)

23-24: Good improvements to configurability and parameter naming.

The changes enhance the class design:

  • Environment variables NUM_SECONDS_TO_SLEEP and WORKERS provide runtime configurability
  • Renaming model_version to model is more intuitive and consistent with common ML frameworks
  • Addition of data_parallel_size parameter enables data parallelism support

Also applies to: 149-152


209-213: Clear documentation of parallelism constraints.

The code properly enforces and documents the current limitation that data parallelism cannot be used with tensor parallelism. The TODO comment indicates this is a known limitation that may be addressed in future versions.


214-222: Well-structured VLLM client initialization.

The client initialization is properly placed after accelerator setup and includes:

  • Correct use of the renamed model parameter
  • Fixed seed for reproducibility (aligns with PR objectives)
  • Appropriate distributed executor backend configuration
  • Proper forwarding of additional kwargs

307-310: Simplified generation parameters align with PR objectives.

The removal of default temperature and top_p parameters and keeping only max_tokens is consistent with the broader PR effort to simplify generation configurations across task YAML files.


333-336: Content ordering change: images before text.

The message content now adds images before text, which may affect model behavior. This could be intentional for better performance with vision-language models.

Was this ordering change intentional? Some vision-language models may be sensitive to the order of image and text inputs in the prompt.

Comment on lines +166 to 173
self.chat_template = chat_template
self.min_image_pixels = min_image_pixels
self.data_parallel_size = data_parallel_size
# Qwen 2/2.5-VL models enforce minimum image dimensions
self._enforce_image_resize = self._is_qwen_vl_model(model_version)
self._enforce_image_resize = self._is_qwen_vl_model(model)

# Load chat template during initialization
self.chat_template = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: Duplicate and conflicting chat_template initialization.

The self.chat_template is assigned the parameter value on line 166 but then immediately set to None on line 173, discarding the initial assignment. This appears to be an error.

Remove the redundant assignment:

         self.model = model
         self.max_frame_num = max_frame_num
-        self.chat_template = chat_template
         self.min_image_pixels = min_image_pixels
         self.data_parallel_size = data_parallel_size
         # Qwen 2/2.5-VL models enforce minimum image dimensions
         self._enforce_image_resize = self._is_qwen_vl_model(model)
 
         # Load chat template during initialization
         self.chat_template = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.chat_template = chat_template
self.min_image_pixels = min_image_pixels
self.data_parallel_size = data_parallel_size
# Qwen 2/2.5-VL models enforce minimum image dimensions
self._enforce_image_resize = self._is_qwen_vl_model(model_version)
self._enforce_image_resize = self._is_qwen_vl_model(model)
# Load chat template during initialization
self.chat_template = None
self.model = model
self.max_frame_num = max_frame_num
self.min_image_pixels = min_image_pixels
self.data_parallel_size = data_parallel_size
# Qwen 2/2.5-VL models enforce minimum image dimensions
self._enforce_image_resize = self._is_qwen_vl_model(model)
# Load chat template during initialization
self.chat_template = None
🤖 Prompt for AI Agents
In lmms_eval/models/vllm.py between lines 166 and 173, the attribute
self.chat_template is first assigned the parameter value and then immediately
overwritten to None, which discards the initial assignment. Remove the redundant
assignment setting self.chat_template to None on line 173 to preserve the
intended initialization with the parameter value.

"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
client = OpenAI(api_key=API_KEY, base_url=API_URL.rstrip("chat/completions"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect use of .rstrip() with multi-character string.

The .rstrip("chat/completions") call removes any combination of the characters in "chat/completions" from the end, not the exact string. This could lead to unexpected behavior.

Use .removesuffix() for exact string removal (Python 3.9+):

-        client = OpenAI(api_key=API_KEY, base_url=API_URL.rstrip("chat/completions"))
+        client = OpenAI(api_key=API_KEY, base_url=API_URL.removesuffix("/chat/completions"))

Or for older Python versions:

-        client = OpenAI(api_key=API_KEY, base_url=API_URL.rstrip("chat/completions"))
+        base_url = API_URL[:-len("/chat/completions")] if API_URL.endswith("/chat/completions") else API_URL
+        client = OpenAI(api_key=API_KEY, base_url=base_url)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
client = OpenAI(api_key=API_KEY, base_url=API_URL.rstrip("chat/completions"))
client = OpenAI(api_key=API_KEY, base_url=API_URL.removesuffix("/chat/completions"))
🧰 Tools
🪛 Ruff (0.12.2)

85-85: Using .strip() with multi-character strings is misleading

(B005)

🤖 Prompt for AI Agents
In lmms_eval/tasks/mathverse/mathverse_evals.py at line 85, the use of
.rstrip("chat/completions") is incorrect because it removes any combination of
those characters from the end rather than the exact substring. Replace
.rstrip("chat/completions") with .removesuffix("chat/completions") if using
Python 3.9 or newer, or implement a conditional check to remove the exact suffix
for older Python versions to ensure only the intended substring is removed from
API_URL.

@cursor
Copy link

cursor bot commented Jul 29, 2025

Bugbot found 1 bug

To see it, activate your membership in your Cursor dashboard.

@kcz358 kcz358 merged commit 4994927 into main Jul 30, 2025
4 checks passed
@kcz358 kcz358 deleted the stable/0d3 branch July 30, 2025 01:16
@kcz358 kcz358 restored the stable/0d3 branch July 30, 2025 03:51
@coderabbitai coderabbitai bot mentioned this pull request Sep 15, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants