Misaligned question assignment in CharXiv descriptive evaluation

Hi, thanks for maintaining this excellent evaluation framework! I've been using it for research and noticed an issue with the CharXiv descriptive task.

**Problem:**
In `charxiv_descriptive_process_docs` of `lmms_eval/tasks/charxiv/utils.py`, the line `q_number = indice % 4 + 1` doesn't correctly assign questions after `dataset.repeat(4)`. 

Say if I have a dataset of 1000 samples, after `repeat(4)`, the dataset structure is:
- Indices 0-999: Copy 1
- Indices 1000-1999: Copy 2
- Indices 2000-2999: Copy 3
- Indices 3000-3999: Copy 4

However, the modulo operation gives the same result for all copies of the same example:
- Index 0: 0 % 4 = 0, q_number = 1
- Index 1000: 1000 % 4 = 0, q_number = 1 (should be 2)
- Index 2000: 2000 % 4 = 0, q_number = 1 (should be 3)
- Index 3000: 3000 % 4 = 0, q_number = 1 (should be 4)

This means all four copies use descriptive_q1, so only one of the four question types is ever evaluated.
I wanted to check if this is the intended behavior or if I'm misunderstanding the logic here?

**Proposed fix (if this is indeed a problem):**
Perhaps replace `q_number = indice % 4 + 1` with `q_number = indice // original_size + 1` where original_size is 1000 in this example case.

This correctly maps:
- Indices 0-999: q_number = 1
- Indices 1000-1999: q_number = 2
- Indices 2000-2999: q_number = 3
- Indices 3000-3999: q_number = 4

Would you please confirm the logic here? Thanks very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misaligned question assignment in CharXiv descriptive evaluation #911

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misaligned question assignment in CharXiv descriptive evaluation #911

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions