Skip to content

Conversation

@HanSolo9682
Copy link
Contributor

Hi, I want to add our video benchmark Vinoground to the lmms-eval database. This temporal counterfactual benchmark contains 1000 short and natural video-caption pairs. The best model, GPT-4o, can only perform at 35% on one of our metrics, while humans can achieve ~90% at ease. I have been able to reproduce our results with the code provided here on LLaVA-Video-7B-Qwen2. I believe that more models should be allowed to evaluate on Vinoground to truly test their dense temporal reasoning capabilities, and hence i find lmms-eval a great platform to do so.

@Luodian
Copy link
Contributor

Luodian commented Oct 16, 2024

Hi thanks for this PR, can you also pin a result screenshot for a random model?

Also there are some linting issues may need to use pre-commit to resolve it.

@HanSolo9682
Copy link
Contributor Author

Screenshot 2024-10-16 at 01 45 43

@HanSolo9682
Copy link
Contributor Author

I have just ran pre-commit and fixed the linting.

@Luodian Luodian merged commit a72a9c0 into EvolvingLMMs-Lab:main Oct 16, 2024
KairuiHu pushed a commit that referenced this pull request Oct 24, 2024
Co-authored-by: jzhang2427 <[email protected]>
ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024
MichalCiesiolka pushed a commit to MichalCiesiolka/lmms-eval-llmzszl that referenced this pull request Apr 3, 2025
dadwadw233 pushed a commit to dadwadw233/lmms-eval that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants