[Task] Add support for VisualPuzzles #637
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Thank you for your contributions!
Description of PR: Add support for VisualPuzzles
VisualPuzzles is a benchmark that targets visual reasoning while deliberately minimizing reliance on specialized knowledge. VisualPuzzles consists of 1168 diverse questions spanning five categories: algorithmic, analogical, deductive, inductive, and spatial reasoning. Each puzzle is labeled as easy, medium, or hard. All puzzles are multiple-choice questions with 4 options.
Two evaluation prompts are included, one utilizing CoT lmms_eval/tasks/VisualPuzzles/VisualPuzzles_cot.yaml, one without CoT lmms_eval/tasks/VisualPuzzles/VisualPuzzles_direct.yaml.
Arxiv: https://arxiv.org/abs/2504.10342
Huggingface Dataset: https://huggingface.co/datasets/neulab/VisualPuzzles
Project Webpage: https://neulab.github.io/VisualPuzzles/
Github Code: https://github.com/neulab/VisualPuzzles/tree/main