add vinoground #326

HanSolo9682 · 2024-10-16T04:31:00Z

Hi, I want to add our video benchmark Vinoground to the lmms-eval database. This temporal counterfactual benchmark contains 1000 short and natural video-caption pairs. The best model, GPT-4o, can only perform at 35% on one of our metrics, while humans can achieve ~90% at ease. I have been able to reproduce our results with the code provided here on LLaVA-Video-7B-Qwen2. I believe that more models should be allowed to evaluate on Vinoground to truly test their dense temporal reasoning capabilities, and hence i find lmms-eval a great platform to do so.

Luodian · 2024-10-16T06:34:52Z

Hi thanks for this PR, can you also pin a result screenshot for a random model?

Also there are some linting issues may need to use pre-commit to resolve it.

HanSolo9682 · 2024-10-16T06:45:51Z

HanSolo9682 · 2024-10-16T06:49:21Z

I have just ran pre-commit and fixed the linting.

Co-authored-by: jzhang2427 <[email protected]>

add vinoground

2d466f7

HanSolo9682 force-pushed the main branch from 487b15c to 2d466f7 Compare October 16, 2024 06:49

Luodian approved these changes Oct 16, 2024

View reviewed changes

Luodian merged commit a72a9c0 into EvolvingLMMs-Lab:main Oct 16, 2024

KairuiHu pushed a commit that referenced this pull request Oct 24, 2024

add vinoground (#326)

5978627

Co-authored-by: jzhang2427 <[email protected]>

ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024

add vinoground (EvolvingLMMs-Lab#326)

a3f8b3d

Co-authored-by: jzhang2427 <[email protected]>

MichalCiesiolka pushed a commit to MichalCiesiolka/lmms-eval-llmzszl that referenced this pull request Apr 3, 2025

add vinoground (EvolvingLMMs-Lab#326)

22e2a22

Co-authored-by: jzhang2427 <[email protected]>

dadwadw233 pushed a commit to dadwadw233/lmms-eval that referenced this pull request Apr 28, 2025

add vinoground (EvolvingLMMs-Lab#326)

2f9be1b

Co-authored-by: jzhang2427 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add vinoground #326

add vinoground #326

Uh oh!

HanSolo9682 commented Oct 16, 2024

Uh oh!

Luodian commented Oct 16, 2024 •

edited

Loading

Uh oh!

HanSolo9682 commented Oct 16, 2024

Uh oh!

HanSolo9682 commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add vinoground #326

add vinoground #326

Uh oh!

Conversation

HanSolo9682 commented Oct 16, 2024

Uh oh!

Luodian commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HanSolo9682 commented Oct 16, 2024

Uh oh!

HanSolo9682 commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Luodian commented Oct 16, 2024 •

edited

Loading