-
Notifications
You must be signed in to change notification settings - Fork 453
[New Benchmark] Add Video-TT Benchmark #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA comprehensive set of files has been added to introduce and configure a new suite of video-based question-answering evaluation tasks under the "video-tt" domain. The changes include YAML configuration files for multiple task variants, utility modules for scoring, prompt construction, and result aggregation, and integration of GPT-based evaluation logic for open-ended responses. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant TaskConfig (YAML)
participant Utils
participant GPTUtils
participant Model
User->>TaskConfig: Selects and loads video-tt task config
TaskConfig->>Utils: Calls doc_to_visual/doc_to_text for input prep
TaskConfig->>Model: Sends formatted input for prediction
Model-->>TaskConfig: Returns prediction
TaskConfig->>GPTUtils: process_results (for open-ended tasks)
GPTUtils->>GPT API: get_eval (question, answer, pred)
GPT API-->>GPTUtils: Returns evaluation (yes/no, score)
GPTUtils-->>TaskConfig: Returns processed score
TaskConfig->>Utils: Aggregates results (aggregate_results/oe)
Utils-->>User: Returns final metric(s)
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (13)
lmms_eval/tasks/video-tt/_default_template.yaml (1)
1-5: Add missing newline + prefer canonical booleansYAML-lint fails because the file is missing the terminating newline and uses capitalised booleans.
Fixing both keeps CI green and avoids diff-only churn later.dataset_path: lmms-lab/video-tt dataset_kwargs: - token: True - cache_dir: video-tt - video: True + token: true + cache_dir: video-tt + video: true +lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml (1)
32-32: Trim trailing whitespace- xcomposer2_4khd: + xcomposer2_4khd:lmms_eval/tasks/video-tt/videott_all_audio.yaml (1)
32-32: Remove trailing spaces to satisfy YAML-lint- llava_vid: + llava_vid:lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml (1)
32-32: Trailing whitespace- llava_vid: + llava_vid:lmms_eval/tasks/video-tt/videott_single_mc.yaml (2)
2-3: Dangling indent inside commented blockYAML treats indented comments as part of the previous mapping level; keep comment indentation consistent to avoid accidental key insertion when uncommented later.
-# dataset_name: 'test_mc_new' - # From_YouTube: True +# dataset_name: 'test_mc_new' +# From_YouTube: True
33-33: Trim trailing whitespace- post_prompt: "" + post_prompt: ""lmms_eval/tasks/video-tt/videott_single_mc_description.yaml (1)
32-32: Fix trailing spaces.The static analysis tool detected trailing spaces on this line. Please remove them to maintain consistent formatting.
- # qwen_vl: + # qwen_vl:lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml (1)
32-32: Fix trailing spaces.The static analysis tool detected trailing spaces on this line. Please remove them to maintain consistent formatting.
- # qwen_vl: + # qwen_vl:lmms_eval/tasks/video-tt/videott_all.yaml (1)
32-32: Fix trailing spaces.The static analysis tool detected trailing spaces on this line. Please remove them to maintain consistent formatting.
- # qwen_vl: + # qwen_vl:lmms_eval/tasks/video-tt/utils.py (2)
108-109: Simplify conditional expressions using .get() method.The conditional expressions can be simplified using the
.get()method as suggested by static analysis.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" - pre_promt = lmms_eval_specific_kwargs["pre_prompt"] if "pre_prompt" in lmms_eval_specific_kwargs else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
189-189: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, so the
fprefix is unnecessary.- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (2)
32-32: Trailing whitespace flagged by yamllint
Line 32 contains stray spaces after the comment marker. While innocuous, it fails the pre-commityamllinthook used in this repo.- # qwen_vl:·· + # qwen_vl:
17-17: Minor docstring spelling
registed➜registeredfor professionalism.-# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results +# Note that the metric name can be either a registered metric function (such as for GQA) or a key returned by process_results
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
lmms_eval/tasks/video-tt/_default_template.yaml(1 hunks)lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)lmms_eval/tasks/video-tt/utils.py(1 hunks)lmms_eval/tasks/video-tt/videott_all.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_all_audio.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_single_mc.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_single_mc_description.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lmms_eval/tasks/video-tt/utils.py (1)
lmms_eval/tasks/_task_utils/file_utils.py (1)
generate_submission_file(4-8)
🪛 YAMLlint (1.37.1)
lmms_eval/tasks/video-tt/_default_template.yaml
[error] 5-5: no new line character at the end of file
(new-line-at-end-of-file)
lmms_eval/tasks/video-tt/videott_single_mc.yaml
[error] 33-33: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_all.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_single_mc_description.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_all_audio.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
58-58: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
89-89: Local variable cache_dir is assigned to but never used
Remove assignment to unused variable cache_dir
(F841)
108-108: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
109-109: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
120-120: Do not use bare except
(E722)
123-123: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
124-124: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
189-189: f-string without any placeholders
Remove extraneous f prefix
(F541)
206-206: f-string without any placeholders
Remove extraneous f prefix
(F541)
238-238: Loop control variable k not used within loop body
(B007)
275-275: Loop control variable k not used within loop body
(B007)
lmms_eval/tasks/video-tt/gpt_utils.py
2-2: decord.VideoReader imported but unused
Remove unused import
(F401)
2-2: decord.cpu imported but unused
Remove unused import
(F401)
3-3: numpy imported but unused
Remove unused import: numpy
(F401)
5-5: sys imported but unused
Remove unused import: sys
(F401)
6-6: datetime imported but unused
Remove unused import: datetime
(F401)
7-7: json imported but unused
Remove unused import: json
(F401)
9-9: yaml imported but unused
Remove unused import: yaml
(F401)
12-12: openai imported but unused
Remove unused import: openai
(F401)
13-13: openai.OpenAI imported but unused
Remove unused import: openai.OpenAI
(F401)
15-15: Redefinition of unused ast from line 1
Remove definition: ast
(F811)
111-111: Undefined name e
(F821)
153-153: Local variable question is assigned to but never used
Remove assignment to unused variable question
(F841)
155-155: Local variable pred is assigned to but never used
Remove assignment to unused variable pred
(F841)
156-156: Local variable review is assigned to but never used
Remove assignment to unused variable review
(F841)
157-157: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
163-163: f-string without any placeholders
Remove extraneous f prefix
(F541)
193-193: Local variable review is assigned to but never used
Remove assignment to unused variable review
(F841)
194-194: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
261-261: f-string without any placeholders
Remove extraneous f prefix
(F541)
🪛 GitHub Actions: Lint
lmms_eval/tasks/video-tt/utils.py
[error] 1-1: Black formatting check failed. The file was reformatted by the black hook.
[error] 1-1: isort import sorting check failed. The file was modified by the isort hook.
lmms_eval/tasks/video-tt/gpt_utils.py
[error] 1-1: Black formatting check failed. The file was reformatted by the black hook.
[error] 1-1: isort import sorting check failed. The file was modified by the isort hook.
| task: videott_wrongly_led_oe | ||
| test_split: test_wrongly_led_oe | ||
| output_type: generate_until | ||
| doc_to_visual: !function utils.videott_doc_to_visual | ||
| doc_to_text: !function utils.videott_doc_to_text | ||
| doc_to_target: "answer" | ||
| generation_kwargs: | ||
| max_new_tokens: 50 | ||
| temperature: 0 | ||
| top_p: 1.0 | ||
| num_beams: 1 | ||
| do_sample: false | ||
| # The return value of process_results will be used by metrics | ||
| process_results: !function gpt_utils.gpt_score_proccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task id & handler typo will break task registration
task:value (videott_wrongly_led_oe) does not match the file name (videott_wrong_leading_oe.yaml).- Handler name
gpt_score_proccessis miss-spelled (double c).
Either issue will raise at load time.
-task: videott_wrongly_led_oe
+task: videott_wrong_leading_oe
-process_results: !function gpt_utils.gpt_score_proccess
+process_results: !function gpt_utils.gpt_score_process📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| task: videott_wrongly_led_oe | |
| test_split: test_wrongly_led_oe | |
| output_type: generate_until | |
| doc_to_visual: !function utils.videott_doc_to_visual | |
| doc_to_text: !function utils.videott_doc_to_text | |
| doc_to_target: "answer" | |
| generation_kwargs: | |
| max_new_tokens: 50 | |
| temperature: 0 | |
| top_p: 1.0 | |
| num_beams: 1 | |
| do_sample: false | |
| # The return value of process_results will be used by metrics | |
| process_results: !function gpt_utils.gpt_score_proccess | |
| task: videott_wrong_leading_oe | |
| test_split: test_wrongly_led_oe | |
| output_type: generate_until | |
| doc_to_visual: !function utils.videott_doc_to_visual | |
| doc_to_text: !function utils.videott_doc_to_text | |
| doc_to_target: "answer" | |
| generation_kwargs: | |
| max_new_tokens: 50 | |
| temperature: 0 | |
| top_p: 1.0 | |
| num_beams: 1 | |
| do_sample: false | |
| # The return value of process_results will be used by metrics | |
| process_results: !function gpt_utils.gpt_score_process |
🤖 Prompt for AI Agents
In lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml lines 3 to 16, fix the
task id by changing the `task:` value from `videott_wrongly_led_oe` to
`videott_wrong_leading_oe` to match the file name. Also, correct the handler
name in `process_results` from `gpt_score_proccess` to `gpt_score_process` to
fix the typo. These changes will prevent task registration errors at load time.
| gpt4v: | ||
| pre_prompt: "" | ||
| post_prompt: "Answer the question with A, B, C, or D." |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| task: videott_correctly_led_oe | ||
| test_split: test_correctly_led_oe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task name/file name mismatch could break automatic discovery
The task id inside the YAML (videott_correctly_led_oe) does not match the file’s slug videott_correct_leading_oe.yaml. Loader logic in lmms-eval commonly relies on filename ↔ task-id symmetry; a mismatch risks the task being silently skipped or duplicated under two names.
Either rename the file or update the task: field so they are identical.
🤖 Prompt for AI Agents
In lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml at lines 3-4, the
task id 'videott_correctly_led_oe' does not match the filename slug
'videott_correct_leading_oe', causing potential issues with automatic task
discovery. To fix this, update the 'task:' field in the YAML to exactly match
the filename slug 'videott_correct_leading_oe' or rename the file to match the
current task id, ensuring consistency between the task id and filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (8)
lmms_eval/tasks/video-tt/gpt_utils.py (5)
1-15: Remove unused imports and fix import organization.This duplicates the previous review comment about cleaning up unused imports. The static analysis correctly identifies multiple unused imports that should be removed.
111-111: Fix undefined variable reference.This duplicates the previous review comment about the undefined variable
e. The variable is referenced but not defined in this scope, causing a runtime error.
143-163: Remove commented-out code and unused variables.This duplicates the previous review comment about removing the large block of commented-out code and unused variable assignments.
180-201: Remove commented-out code and implement GPT evaluation.This duplicates the previous review comment about removing commented-out code. Additionally, like
gpt_score_proccess, this function also returns default values without performing actual GPT evaluation.
163-163: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, so the
fprefix is unnecessary.lmms_eval/tasks/video-tt/utils.py (3)
1-15: Remove unused imports.This duplicates the previous review comment about cleaning up unused imports. The static analysis correctly identifies multiple unused imports that should be removed.
113-113: Replace hard-coded path with configurable parameter.This duplicates the previous review comment about making the hard-coded path configurable via environment variables or parameters.
117-117: Replace bare except with specific exception handling.This duplicates the previous review comment about using specific exception types instead of bare except.
🧹 Nitpick comments (5)
lmms_eval/tasks/video-tt/gpt_utils.py (1)
135-135: Fix function name typo.The function name
gpt_score_proccesscontains a typo - it should begpt_score_process(missing 's' in 'process').-def gpt_score_proccess(doc, result): +def gpt_score_process(doc, result):Note: This change will require updating all references to this function in the YAML configuration files.
lmms_eval/tasks/video-tt/utils.py (4)
83-96: Remove dead code and clarify function purpose.The
videott_doc_to_visual_tosfunction contains a large block of commented-out code that should be removed. The function appears to construct URLs instead of local file paths, which should be documented.def videott_doc_to_visual_tos(doc): - cache_dir = os.path.join(base_cache_dir, cache_name) - # import pdb;pdb.set_trace() + """ + Constructs a TOS (remote) URL for video access instead of local file paths. + """ video_path = doc["video_id"] + ".mp4" video_path = os.path.join("https://tosv.byted.org/obj/tiktok-maas-us/robustness-benchmark/", video_path) - # if os.path.exists(video_path): - # video_path = video_path - # elif os.path.exists(video_path.replace("mp4", "MP4")): - # video_path = video_path.replace("mp4", "MP4") - # elif os.path.exists(video_path.replace("mp4", "mkv")): - # video_path = video_path.replace("mp4", "mkv") - # else: - # sys.exit(f"video path:{video_path} does not exist, please check") return [video_path]
101-107: Simplify conditional logic using dict.get().The conditional logic can be simplified using the
dict.get()method as suggested by static analysis.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" - pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") + pre_promt = lmms_eval_specific_kwargs.get( + "pre_prompt", + "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." + )
190-190: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, so the
fprefix is unnecessary.- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
208-208: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, so the
fprefix is unnecessary.- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)lmms_eval/tasks/video-tt/utils.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lmms_eval/tasks/video-tt/utils.py (1)
lmms_eval/tasks/_task_utils/file_utils.py (1)
generate_submission_file(4-8)
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
56-56: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
84-84: Local variable cache_dir is assigned to but never used
Remove assignment to unused variable cache_dir
(F841)
101-101: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
103-105: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
117-117: Do not use bare except
(E722)
120-120: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
122-124: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
190-190: f-string without any placeholders
Remove extraneous f prefix
(F541)
208-208: f-string without any placeholders
Remove extraneous f prefix
(F541)
239-239: Loop control variable k not used within loop body
(B007)
274-274: Loop control variable k not used within loop body
(B007)
lmms_eval/tasks/video-tt/gpt_utils.py
2-2: datetime imported but unused
Remove unused import: datetime
(F401)
3-3: json imported but unused
Remove unused import: json
(F401)
5-5: sys imported but unused
Remove unused import: sys
(F401)
8-8: numpy imported but unused
Remove unused import: numpy
(F401)
9-9: openai imported but unused
Remove unused import: openai
(F401)
11-11: yaml imported but unused
Remove unused import: yaml
(F401)
12-12: decord.VideoReader imported but unused
Remove unused import
(F401)
12-12: decord.cpu imported but unused
Remove unused import
(F401)
14-14: openai.OpenAI imported but unused
Remove unused import: openai.OpenAI
(F401)
111-111: Undefined name e
(F821)
153-153: Local variable question is assigned to but never used
Remove assignment to unused variable question
(F841)
155-155: Local variable pred is assigned to but never used
Remove assignment to unused variable pred
(F841)
156-156: Local variable review is assigned to but never used
Remove assignment to unused variable review
(F841)
157-157: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
163-163: f-string without any placeholders
Remove extraneous f prefix
(F541)
193-193: Local variable review is assigned to but never used
Remove assignment to unused variable review
(F841)
194-194: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
261-261: f-string without any placeholders
Remove extraneous f prefix
(F541)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (4)
lmms_eval/tasks/video-tt/utils.py (2)
1-15: Clean up unused imports.Multiple imports are unused and should be removed to improve code maintainability.
102-102: Replace bare except with specific exception handling.Using bare
exceptis discouraged as it can mask important errors.lmms_eval/tasks/video-tt/gpt_utils.py (2)
1-15: Clean up unused imports.Multiple imports are unused and should be removed to improve code maintainability.
111-111: Fix undefined variable reference.The variable
eis referenced but not defined in this scope.
🧹 Nitpick comments (6)
lmms_eval/tasks/video-tt/utils.py (4)
86-90: Simplify conditional logic using dict.get().The conditional logic can be simplified using the
get()method as suggested by static analysis.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")- pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
105-109: Simplify conditional logic using dict.get().Similar to the previous function, this can be simplified using the
get()method.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")- pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
177-177: Remove unnecessary f-string prefix.The f-string has no placeholders and should be a regular string.
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
195-195: Remove unnecessary f-string prefix.Similar to the previous function, this f-string has no placeholders.
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}lmms_eval/tasks/video-tt/gpt_utils.py (2)
163-163: Remove unnecessary f-string prefix.The f-string has no placeholders and should be a regular string.
- return {f"videott_open_ended_score": data_dict} + return {"videott_open_ended_score": data_dict}
222-222: Remove unnecessary f-string prefix.Similar to previous instances, this f-string has no placeholders.
- return {f"accuracy": pred == doc["answer"]} + return {"accuracy": pred == doc["answer"]}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)lmms_eval/tasks/video-tt/utils.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lmms_eval/tasks/video-tt/utils.py (1)
lmms_eval/tasks/_task_utils/file_utils.py (1)
generate_submission_file(4-8)
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/gpt_utils.py
2-2: datetime imported but unused
Remove unused import: datetime
(F401)
3-3: json imported but unused
Remove unused import: json
(F401)
5-5: sys imported but unused
Remove unused import: sys
(F401)
8-8: numpy imported but unused
Remove unused import: numpy
(F401)
9-9: openai imported but unused
Remove unused import: openai
(F401)
11-11: yaml imported but unused
Remove unused import: yaml
(F401)
12-12: decord.VideoReader imported but unused
Remove unused import
(F401)
12-12: decord.cpu imported but unused
Remove unused import
(F401)
14-14: openai.OpenAI imported but unused
Remove unused import: openai.OpenAI
(F401)
111-111: Undefined name e
(F821)
157-157: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
163-163: f-string without any placeholders
Remove extraneous f prefix
(F541)
222-222: f-string without any placeholders
Remove extraneous f prefix
(F541)
lmms_eval/tasks/video-tt/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
56-56: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
86-86: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
88-90: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
102-102: Do not use bare except
(E722)
105-105: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
107-109: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
177-177: f-string without any placeholders
Remove extraneous f prefix
(F541)
195-195: f-string without any placeholders
Remove extraneous f prefix
(F541)
226-226: Loop control variable k not used within loop body
(B007)
261-261: Loop control variable k not used within loop body
(B007)
🪛 GitHub Actions: Lint
lmms_eval/tasks/video-tt/gpt_utils.py
[error] 1-1: Black formatting check failed. The file was automatically reformatted by Black.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
lmms_eval/tasks/video-tt/gpt_utils.py (5)
1-14: Clean up unused imports to improve code maintainability.Multiple imports are unused according to static analysis. This matches previous review feedback and should be addressed.
Remove the following unused imports:
datetime,json,sys(lines 2-5)numpy,openai,yaml(lines 8-11)VideoReader,cpufrom decord (line 12)OpenAIfrom openai (line 14)import ast -import datetime -import json import os -import sys import time -import numpy as np -import openai import requests -import yaml -from decord import VideoReader, cpu from loguru import logger as eval_logger -from openai import OpenAI
68-68: Fix data type inconsistency in example response.The example shows a float score (4.8) while the instruction specifies INTEGER format, which could confuse the GPT model.
- "For example, your response should look like this: {'pred': 'yes', 'score': 4.8}." + "For example, your response should look like this: {'pred': 'yes', 'score': 4}."
111-111: Fix undefined variable reference that will cause runtime error.The variable
eis referenced but not defined in this scope, causing a runtime error.else: # If this was the last attempt, log and return empty - eval_logger.error(f"All {retries} attempts failed. Last error message: {e}") + eval_logger.error(f"All {retries} attempts failed.")
149-149: Increase max_tokens for meaningful GPT responses.The
max_tokens=1is too low for GPT to generate the expected dictionary response format.- review, model_name = get_eval(question, answer, pred, 1) + review, model_name = get_eval(question, answer, pred, 100)
153-158: Remove redundant variable assignments in exception handler.The exception block redundantly assigns variables that are not used afterward.
except Exception as e: eval_logger.error(f"Error for Question ID: {doc.get('question_id', 'Unknown')}: {e}") - question = doc["question"] - answer = doc["answer"] - pred = result[0] - review = "Failed to Get a Proper Review." - model_name = "Failed Request" scores = ["no", 0]
🧹 Nitpick comments (2)
lmms_eval/tasks/video-tt/gpt_utils.py (2)
163-163: Remove unnecessary f-string prefix.The f-string has no placeholders, making the
fprefix unnecessary.- return {f"videott_open_ended_score": data_dict} + return {"videott_open_ended_score": data_dict}
223-223: Remove unnecessary f-string prefix.The f-string has no placeholders, making the
fprefix unnecessary.- return {f"accuracy": pred == doc["answer"]} + return {"accuracy": pred == doc["answer"]}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/gpt_utils.py
2-2: datetime imported but unused
Remove unused import: datetime
(F401)
3-3: json imported but unused
Remove unused import: json
(F401)
5-5: sys imported but unused
Remove unused import: sys
(F401)
8-8: numpy imported but unused
Remove unused import: numpy
(F401)
9-9: openai imported but unused
Remove unused import: openai
(F401)
11-11: yaml imported but unused
Remove unused import: yaml
(F401)
12-12: decord.VideoReader imported but unused
Remove unused import
(F401)
12-12: decord.cpu imported but unused
Remove unused import
(F401)
14-14: openai.OpenAI imported but unused
Remove unused import: openai.OpenAI
(F401)
111-111: Undefined name e
(F821)
157-157: Local variable model_name is assigned to but never used
Remove assignment to unused variable model_name
(F841)
163-163: f-string without any placeholders
Remove extraneous f prefix
(F541)
223-223: f-string without any placeholders
Remove extraneous f prefix
(F541)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
lmms_eval/tasks/video-tt/utils.py (3)
1-15: Remove unused imports to clean up the code.Multiple imports are flagged as unused by static analysis tools and should be removed to improve code maintainability.
62-62: Fix the AUDIO_PATH environment variable assignment.The environment variable should be assigned only to
AUDIO_PATH, not to both variables.
102-102: Replace bare except with specific exception handling.Using bare
exceptis discouraged as it can mask important errors. Specify the expected exception types.lmms_eval/tasks/video-tt/gpt_utils.py (1)
60-60: Fix inconsistent data type in example response.The example shows a float score (4.8) but the instruction specifies INTEGER, which could confuse the GPT model.
🧹 Nitpick comments (9)
lmms_eval/tasks/video-tt/utils.py (6)
86-90: Simplify conditional assignments using dict.get().Replace the conditional block with a more concise
.get()method call.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")- pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
105-109: Simplify conditional assignments using dict.get().Similar to the previous function, use
.get()method for cleaner code.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")- pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
177-177: Remove unnecessary f-string prefix.The f-string has no placeholders, so the f prefix is unnecessary.
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
195-195: Remove unnecessary f-string prefix.The f-string has no placeholders, so the f prefix is unnecessary.
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
226-228: Simplify loop variable usage.The loop variable
kis not used in the loop body. Consider restructuring for clarity.- for k, v in category2score.items(): - total_correct += v["correct"] - total_answered += v["answered"] + for v in category2score.values(): + total_correct += v["correct"] + total_answered += v["answered"]
261-263: Simplify loop variable usage.Similar to the above, the loop variable
kis not used in the loop body.- for k, v in category2score.items(): - total_correct += v["correct"] - total_answered += v["answered"] + for v in category2score.values(): + total_correct += v["correct"] + total_answered += v["answered"]lmms_eval/tasks/video-tt/gpt_utils.py (3)
78-106: Improve error handling structure and logging.The retry logic is well-implemented, but there's a potential issue with the loop structure and return statements.
The function has two return statements at the end (lines 104 and 106) which creates unreachable code. Consider restructuring:
for attempt in range(retries): try: response = requests.post(API_URL, headers=headers, json=payload, timeout=60) response.raise_for_status() try: response_data = response.json() except requests.exceptions.JSONDecodeError: eval_logger.error(f"JSON decode error on attempt {attempt + 1}. Response text: {response.text}") continue content = response_data["choices"][0]["message"]["content"].strip() if content != "": return content, response_data["model"] except requests.exceptions.HTTPError as e: eval_logger.error(f"HTTP error on attempt {attempt + 1}: {e}") except requests.exceptions.RequestException as e: eval_logger.error(f"Request exception on attempt {attempt + 1}: {e}") except Exception as e: eval_logger.error(f"Unexpected error on attempt {attempt + 1}: {e}") if attempt < retries - 1: time.sleep(NUM_SECONDS_TO_SLEEP) - return "", "" + # All retries failed + eval_logger.error(f"All {retries} attempts failed.") + return "", ""
211-211: Remove unnecessary f-string prefix.The f-string has no placeholders, so the f prefix is unnecessary.
- return {f"accuracy": pred == doc["answer"]} + return {"accuracy": pred == doc["answer"]}
155-173: Consolidate duplicate aggregation functions.The
aggregate_scoreandaggregate_accuracyfunctions are nearly identical with only the return value differing. Consider consolidating them.-def aggregate_score(results, args): - yes_count = 0 - no_count = 0 - total_score = 0 - - # Iterate over the results to count correctness and sum scores - for result_dict in results: - if result_dict["Correctness"] == "yes": - yes_count += 1 - else: - no_count += 1 - total_score += result_dict["score"] - - # Calculate accuracy and average score - accuracy = yes_count / (yes_count + no_count) if (yes_count + no_count) > 0 else 0 - average_score = total_score / len(results) if results else 0 - eval_logger.info(f"Accuracy: {accuracy}") - eval_logger.info(f"Average Score: {average_score}") - return average_score - - -def aggregate_accuracy(results, args): - yes_count = 0 - no_count = 0 - total_score = 0 - - # Iterate over the results to count correctness and sum scores - for result_dict in results: - if result_dict["Correctness"] == "yes": - yes_count += 1 - else: - no_count += 1 - total_score += result_dict["score"] - - # Calculate accuracy and average score - accuracy = yes_count / (yes_count + no_count) if (yes_count + no_count) > 0 else 0 - average_score = total_score / len(results) if results else 0 - eval_logger.info(f"Accuracy: {accuracy}") - eval_logger.info(f"Average Score: {average_score}") - return accuracy * 100 +def _calculate_metrics(results): + """Helper function to calculate metrics from results.""" + yes_count = sum(1 for result in results if result["Correctness"] == "yes") + no_count = len(results) - yes_count + total_score = sum(result["score"] for result in results) + + accuracy = yes_count / len(results) if results else 0 + average_score = total_score / len(results) if results else 0 + + eval_logger.info(f"Accuracy: {accuracy}") + eval_logger.info(f"Average Score: {average_score}") + + return accuracy, average_score + +def aggregate_score(results, args): + accuracy, average_score = _calculate_metrics(results) + return average_score + +def aggregate_accuracy(results, args): + accuracy, average_score = _calculate_metrics(results) + return accuracy * 100Also applies to: 176-194
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)lmms_eval/tasks/video-tt/utils.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lmms_eval/tasks/video-tt/utils.py (1)
lmms_eval/tasks/_task_utils/file_utils.py (1)
generate_submission_file(4-8)
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/gpt_utils.py
211-211: f-string without any placeholders
Remove extraneous f prefix
(F541)
lmms_eval/tasks/video-tt/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
56-56: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
86-86: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
88-90: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
102-102: Do not use bare except
(E722)
105-105: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
107-109: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
177-177: f-string without any placeholders
Remove extraneous f prefix
(F541)
195-195: f-string without any placeholders
Remove extraneous f prefix
(F541)
226-226: Loop control variable k not used within loop body
(B007)
261-261: Loop control variable k not used within loop body
(B007)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
lmms_eval/tasks/video-tt/utils.py (2)
1-15: Remove unused imports to clean up the code.Many imports are not used in this file and should be removed to improve code clarity and reduce dependencies.
-import datetime -import json import os import re import sys -from collections import defaultdict from pathlib import Path -from typing import Dict, List, Optional, Union -import cv2 -import numpy as np import yaml from loguru import logger as eval_logger -from lmms_eval.tasks._task_utils.file_utils import generate_submission_file
102-107: Fix potential undefined variable and improve error handling.The variable
audio_pathis only defined within theelseblock but is used in thetryblock regardless. Also, the bareexceptclause should be more specific.if not AUDIO_PATH: eval_logger.warning("AUDIO_PATH environment variable not set, skipping audio subtitles") subtitle = "" else: audio_path = os.path.join(AUDIO_PATH, f'{doc["video_id"]}.txt') - try: - with open(audio_path) as f: - subtitle = f.read() - except: - subtitle = "" + try: + with open(audio_path) as f: + subtitle = f.read() + except (FileNotFoundError, IOError, OSError): + subtitle = ""
🧹 Nitpick comments (9)
lmms_eval/tasks/video-tt/utils.py (9)
39-48: Remove commented-out code.This commented-out code block should be removed to keep the codebase clean, as it's replaced by the active implementation below.
-# with open(Path(__file__).parent / "_default_template_yaml", "r") as f: -# raw_data = f.readlines() -# safe_data = [] -# for i, line in enumerate(raw_data): -# # remove function definition since yaml load cannot handle it -# if "!function" not in line: -# safe_data.append(line) - -# config = yaml.safe_load("".join(safe_data))
56-56: Replace unused loop variable with underscore.The loop variable
iis not used within the loop body and should be replaced with_to indicate it's intentionally unused.- for i, line in enumerate(raw_data): + for _, line in enumerate(raw_data):
86-86: Simplify conditional expressions using dict.get().Replace the conditional blocks with more concise
dict.get()calls for better readability.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" - pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")Also applies to: 88-90
87-87: Fix typo in variable name.The variable name
pre_promtshould bepre_promptfor consistency and correctness.- pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") + pre_prompt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")And update the usage:
- full_prompt = pre_promt + "\n" + question + "\n" + post_prompt + full_prompt = pre_prompt + "\n" + question + "\n" + post_prompt
109-113: Fix typo and simplify conditional expressions.Same issues as in the previous function - typo in variable name and can be simplified with
dict.get().- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" - pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") + pre_prompt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")And update the usage:
- full_prompt = subtitles_prompt + subtitle + "\n" + pre_promt + "\n" + question + "\n" + post_prompt + full_prompt = subtitles_prompt + subtitle + "\n" + pre_prompt + "\n" + question + "\n" + post_prompt
119-130: Remove commented-out code and documentation.This commented-out code appears to be documentation or examples that should be removed to keep the code clean.
-# Frames + Subs -# This video's subtitles are listed below: -# 【subtitles】 - -# Select the best answer to the following multiple-choice question based on the video and the subtitles. Respond with only the letter (A, B, C, or D) of the correct option. -# 【question】 -# The best answer is: -# Frames / Frames + Audio -# Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option. -# 【question】 -# The best answer is:
156-162: Remove unused global variables and commented code.The
matriceslist is empty and unused, and the commented loop should be removed.-matrices = [] - -# for i in VIDEO_TYPE: -# for j in CATEGORIES: -# for k in SUB_CATEGORIES: -# for l in TASK_CATEGORIES: -# matrices.append(f"{i}_{j}_{k}_{l}")
181-181: Remove unnecessary f-string prefix.The f-strings don't contain any placeholders and should be regular strings.
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}Apply the same fix to line 199:
- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}Also applies to: 199-199
230-230: Replace unused loop variable with underscore.The loop variable
kis not used within the loop body and should be replaced with_to indicate it's intentionally unused.- for k, v in category2score.items(): + for _, v in category2score.items():Apply the same fix to line 265:
- for k, v in category2score.items(): + for _, v in category2score.items():Also applies to: 265-265
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/tasks/video-tt/utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
56-56: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
86-86: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
88-90: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
106-106: Do not use bare except
(E722)
109-109: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
111-113: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
181-181: f-string without any placeholders
Remove extraneous f prefix
(F541)
199-199: f-string without any placeholders
Remove extraneous f prefix
(F541)
230-230: Loop control variable k not used within loop body
(B007)
265-265: Loop control variable k not used within loop body
(B007)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml (1)
3-4: Fix task id typo – prevents task registration
task:is set tovideott_wrongly_led_oe, which mismatches both the filename and the intended dataset split. At runtime this causes the loader to register an incorrect task id (or fail if the split list is keyed by the correct id).-task: videott_wrongly_led_oe +task: videott_wrong_leading_oelmms_eval/tasks/video-tt/videott_all.yaml (1)
26-29: Duplicate: gpt4v prompt forces MC answer on open-ended taskSame observation made in earlier review – prompt should not request A/B/C/D.
Refer to prior comment.lmms_eval/tasks/video-tt/gpt_utils.py (1)
60-60: Fix inconsistent data type in example response.The example shows a float score (4.8) but the instruction clearly specifies INTEGER. This inconsistency could confuse the GPT model during evaluation.
- "For example, your response should look like this: {'pred': 'yes', 'score': 4.8}." + "For example, your response should look like this: {'pred': 'yes', 'score': 4}."lmms_eval/tasks/video-tt/utils.py (1)
98-98: Replace bare except with specific exception handling.Using bare
exceptis discouraged as it can mask important errors. Please specify the expected exception types.- except: + except (FileNotFoundError, IOError, OSError):lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (1)
3-4: Fix task name/filename mismatch.The task ID
videott_correctly_led_oedoesn't match the filenamevideott_correct_leading_oe.yaml. This inconsistency could cause issues with automatic task discovery in the evaluation framework.-task: videott_correctly_led_oe +task: videott_correct_leading_oe
🧹 Nitpick comments (10)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml (1)
32-33: Remove trailing spaces to satisfy YAML-lintYAML-lint is flagging the blank comment line (32) for trailing spaces. Strip the whitespace to keep CI green.
- # qwen_vl:·· + # qwen_vl:lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml (1)
32-33: Clean trailing whitespaceLine 32 has trailing spaces that violate the repo’s lint rules.
- # qwen_vl:·· + # qwen_vl:lmms_eval/tasks/video-tt/videott_all_audio.yaml (1)
32-33: Trim trailing whitespaceLine 32 is flagged by YAML-lint; remove the spaces.
lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml (1)
32-33: YAML-lint trailing spacesRemove spaces on the comment line 32.
lmms_eval/tasks/video-tt/videott_all.yaml (1)
32-33: Strip trailing whitespaceLine 32 contains stray spaces, breaking lint.
lmms_eval/tasks/video-tt/gpt_utils.py (1)
211-211: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, making the
fprefix unnecessary.- return {f"accuracy": pred == doc["answer"]} + return {"accuracy": pred == doc["answer"]}lmms_eval/tasks/video-tt/utils.py (3)
173-173: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, making the
fprefix unnecessary.- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
191-191: Remove unnecessary f-string prefix.The f-string doesn't contain any placeholders, making the
fprefix unnecessary.- return {f"videott_perception_score": data_dict} + return {"videott_perception_score": data_dict}
78-84: Consider using dict.get() for cleaner code.The conditional logic can be simplified using the
get()method.- post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" + post_prompt = lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") - pre_promt = ( - lmms_eval_specific_kwargs["pre_prompt"] - if "pre_prompt" in lmms_eval_specific_kwargs - else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." - ) + pre_promt = lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (1)
32-32: Remove trailing spaces.Static analysis detected trailing spaces which should be removed for clean formatting.
- # qwen_vl: + # qwen_vl:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lmms_eval/tasks/video-tt/gpt_utils.py(1 hunks)lmms_eval/tasks/video-tt/utils.py(1 hunks)lmms_eval/tasks/video-tt/videott_all.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_all_audio.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
lmms_eval/tasks/video-tt/gpt_utils.py
211-211: f-string without any placeholders
Remove extraneous f prefix
(F541)
lmms_eval/tasks/video-tt/utils.py
48-48: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
78-78: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
80-82: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
98-98: Do not use bare except
(E722)
101-101: Use lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:") instead of an if block
Replace with lmms_eval_specific_kwargs.get("post_prompt", "The best answer is:")
(SIM401)
103-105: Use lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.") instead of an if block
Replace with lmms_eval_specific_kwargs.get("pre_prompt", "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option.")
(SIM401)
173-173: f-string without any placeholders
Remove extraneous f prefix
(F541)
191-191: f-string without any placeholders
Remove extraneous f prefix
(F541)
222-222: Loop control variable k not used within loop body
(B007)
257-257: Loop control variable k not used within loop body
(B007)
🪛 YAMLlint (1.37.1)
lmms_eval/tasks/video-tt/videott_all.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_all_audio.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
🔇 Additional comments (6)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml (1)
4-4: Verify split-name consistency
test_split: test_paraphrased_oediffers from the task id suffixparaphrase.
If the dataset file is actually calledtest_paraphrase_oe, evaluation will crash.
Please double-check the dataset naming or adjust one side accordingly.lmms_eval/tasks/video-tt/videott_all_audio.yaml (1)
38-40: Same MC prompt issue for xcomposer2_4khdThe specialised prompt still forces A/B/C/D answers. Align it with the open-ended nature or switch this YAML to a multiple-choice metric.
lmms_eval/tasks/video-tt/gpt_utils.py (1)
127-151: LGTM - GPT evaluation properly implemented.The function now correctly implements GPT-based evaluation with proper error handling and fallback values. The try-except structure ensures robust operation even when API calls fail.
lmms_eval/tasks/video-tt/utils.py (1)
194-227: LGTM - Robust aggregation logic implemented.The aggregation function properly handles category-based scoring with comprehensive logging and error handling for division by zero cases.
lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (2)
16-16: LGTM - Proper function reference.The function reference
gpt_utils.gpt_score_processcorrectly points to the GPT evaluation function implemented in the gpt_utils module.
9-21: Well-configured evaluation parameters.The generation parameters and metric configuration are properly set up for open-ended video evaluation with appropriate token limits and aggregation functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml (1)
3-4: Task id & split name still mis-spelled – task will not loadThe
task:andtest_split:values must exactly match the dataset identifiers as well as stay consistent with the filename.
They still readvideott_wrongly_led_oe/test_wrongly_led_oe, which will raise a registration error at runtime.-task: videott_wrongly_led_oe -test_split: test_wrongly_led_oe +task: videott_wrong_leading_oe +test_split: test_wrong_leading_oelmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (1)
3-4: Filename ↔task:/test_splitmismatch will break task discovery
task: videott_correctly_led_oeandtest_split: test_correctly_led_oedon’t match the filename slugvideott_correct_leading_oe.
lmms-eval’s loader expects a 1-to-1 mapping; the current mismatch will cause the benchmark to be skipped or duplicated under an unexpected id.Fix by making them identical to the filename (or renaming the file). Suggested patch:
-task: videott_correctly_led_oe -test_split: test_correctly_led_oe +task: videott_correct_leading_oe +test_split: test_correct_leading_oe
🧹 Nitpick comments (4)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml (1)
32-32: Remove trailing whitespace to satisfy YAML-lint- # qwen_vl:·· + # qwen_vl:lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml (1)
32-32: Strip trailing spaces to satisfy YAML-lintLine 32 has superfluous whitespace that triggers the linter.
- # qwen_vl: + # qwen_vl:lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml (1)
10-14: Drop redundant sampling-related kwargs
temperature: 0anddo_sample: falsealready disable stochastic sampling, sotop_pis ignored.
Cleaning it up avoids confusion and makes intent explicit.max_new_tokens: 50 temperature: 0 - top_p: 1.0 num_beams: 1 do_sample: falselmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (1)
32-37: YAML-lint trailing-space errorStatic analysis flags trailing spaces on the commented block (line 32). Although harmless at runtime, it will fail the repo’s pre-commit hooks.
- # qwen_vl: + # qwen_vl:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml(1 hunks)lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_no_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
lmms_eval/tasks/video-tt/videott_wrong_leading_oe.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
🔇 Additional comments (2)
lmms_eval/tasks/video-tt/videott_paraphrase_oe.yaml (1)
4-4: Double-check split identifiertest_paraphrased_oematches the dataset registry
Other files in the series use the “paraphrase” spelling. Please confirm the split name exactly matches what’s exported by the dataset loader; otherwise the task will raise aKeyErrorat runtime.lmms_eval/tasks/video-tt/videott_correct_leading_oe.yaml (1)
16-16: Good catch on the typo correction
process_resultsnow points togpt_utils.gpt_score_process, which aligns with the actual helper function name—no further action needed here.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
|
||
| API_TYPE = os.getenv("API_TYPE", "openai") | ||
|
|
||
| NEXSTONE_HTTP_CHAT_URL = "https://maas.byteintl.net/service/api/v1/chat/completions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongyh20 hi I think we should remove the private url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Variable Definition Error and Typo
Two issues are present:
- NameError for
audio_path: In thevideott_doc_to_text_audiofunction, theaudio_pathvariable is conditionally defined only when theAUDIO_PATHenvironment variable is set. However, it is unconditionally used in a subsequenttryblock. IfAUDIO_PATHis not set,audio_pathremains undefined, leading to aNameError. - Typo in
pre_promt: The variablepre_promt(used invideott_doc_to_textandvideott_doc_to_text_audiofunctions) contains a typo and should bepre_prompt. While functionally consistent, this affects code readability and maintainability.
lmms_eval/tasks/video-tt/utils.py#L78-L107
lmms-eval/lmms_eval/tasks/video-tt/utils.py
Lines 78 to 107 in f96e2ed
| post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" | |
| pre_promt = ( | |
| lmms_eval_specific_kwargs["pre_prompt"] | |
| if "pre_prompt" in lmms_eval_specific_kwargs | |
| else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." | |
| ) | |
| full_prompt = pre_promt + "\n" + question + "\n" + post_prompt | |
| return full_prompt | |
| def videott_doc_to_text_audio(doc, lmms_eval_specific_kwargs=None): | |
| subtitles_prompt = "This video's subtitles are listed below: \n" | |
| if not AUDIO_PATH: | |
| eval_logger.warning("AUDIO_PATH environment variable not set, skipping audio subtitles") | |
| subtitle = "" | |
| else: | |
| audio_path = os.path.join(AUDIO_PATH, f'{doc["video_id"]}.txt') | |
| try: | |
| with open(audio_path) as f: | |
| subtitle = f.read() | |
| except: | |
| subtitle = "" | |
| question = doc["question"] + "\n" + doc["question_prompt"] | |
| post_prompt = lmms_eval_specific_kwargs["post_prompt"] if "post_prompt" in lmms_eval_specific_kwargs else "The best answer is:" | |
| pre_promt = ( | |
| lmms_eval_specific_kwargs["pre_prompt"] | |
| if "pre_prompt" in lmms_eval_specific_kwargs | |
| else "Select the best answer to the following multiple-choice question based on the video. Respond with only the letter (A, B, C, or D) of the correct option." | |
| ) | |
| full_prompt = subtitles_prompt + subtitle + "\n" + pre_promt + "\n" + question + "\n" + post_prompt |
Bug: Score Type Mismatch in GPT Prompt
The GPT evaluation prompt contains contradictory instructions regarding the score data type. It explicitly states that the score should be an integer, but the provided example demonstrates a float value (e.g., 4.8). This inconsistency can confuse the model, leading to inconsistent output formats and potential parsing issues, especially since the parse_score function expects and converts the score to a float.
lmms_eval/tasks/video-tt/gpt_utils.py#L55-L60
lmms-eval/lmms_eval/tasks/video-tt/gpt_utils.py
Lines 55 to 60 in f96e2ed
| f"Correct Answer: {answer}\n" | |
| f"Predicted Answer: {pred}\n\n" | |
| "Provide your evaluation only as a yes/no and score where the score is an integer value between 0 and 5, with 5 indicating the highest meaningful match. " | |
| "Please generate the response in the form of a Python dictionary string with keys 'pred' and 'score', where value of 'pred' is a string of 'yes' or 'no' and value of 'score' is in INTEGER, not STRING." | |
| "DO NOT PROVIDE ANY OTHER OUTPUT TEXT OR EXPLANATION. Only provide the Python dictionary string. " | |
| "For example, your response should look like this: {'pred': 'yes', 'score': 4.8}." |
Was this report helpful? Give feedback by reacting with 👍 or 👎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/tasks/video-tt/videott_all_audio.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
lmms_eval/tasks/video-tt/videott_all_audio.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor BugBot
🔇 Additional comments (2)
lmms_eval/tasks/video-tt/videott_all_audio.yaml (2)
32-34: Trailing whitespace – fails YAML lintingYAMLlint flags line 32 for trailing spaces, which can break strict CI checks.
- # qwen_vl:·· + # qwen_vl:[ suggest_nitpick ]
6-8: Verify helper function names exist
doc_to_visualanddoc_to_textpoint toutils.videott_doc_to_visualandutils.videott_doc_to_text_audio.
If these were renamed or live in a sub-package (e.g.lmms_eval.tasks.video_tt.utils), YAML loading will raiseImportError.Please run a quick search to ensure both functions are defined and exported.
| xcomposer2_4khd: | ||
| pre_prompt: "[UNUSED_TOKEN_146]user\n" | ||
| post_prompt: " Answer this question with A, B, C, or D.[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xcomposer2_4khd prompt contradicts open-ended task – will mis-score answers
The post-prompt forces the model to reply with A/B/C/D, but this YAML is configured for videott_open_ended_score, i.e. free-form answers judged by GPT.
If left as-is, the model is likely to emit a single letter that the scorer will treat as a full answer, producing meaningless scores.
- xcomposer2_4khd:
- pre_prompt: "[UNUSED_TOKEN_146]user\n"
- post_prompt: " Answer this question with A, B, C, or D.[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n"
+ xcomposer2_4khd:
+ pre_prompt: "[UNUSED_TOKEN_146]user\n"
+ # Keep the special tokens but drop the multiple-choice instruction.
+ post_prompt: "[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n"Either remove the MC instruction (as above) or move this model to a dedicated multiple-choice variant of the task.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| xcomposer2_4khd: | |
| pre_prompt: "[UNUSED_TOKEN_146]user\n" | |
| post_prompt: " Answer this question with A, B, C, or D.[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n" | |
| xcomposer2_4khd: | |
| pre_prompt: "[UNUSED_TOKEN_146]user\n" | |
| # Keep the special tokens but drop the multiple-choice instruction. | |
| post_prompt: "[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n" |
🤖 Prompt for AI Agents
In lmms_eval/tasks/video-tt/videott_all_audio.yaml around lines 38 to 40, the
post_prompt for xcomposer2_4khd incorrectly instructs the model to answer with
A, B, C, or D, which conflicts with the open-ended scoring setup. To fix this,
remove the multiple-choice instruction from the post_prompt so the model can
generate free-form answers compatible with the videott_open_ended_score
evaluation.
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Thank you for your contributions!
Summary by CodeRabbit