-
Notifications
You must be signed in to change notification settings - Fork 447
Description
Hi, thanks for maintaining this excellent evaluation framework! I've been using it for research and noticed an issue with the CharXiv descriptive task.
Problem:
In charxiv_descriptive_process_docs of lmms_eval/tasks/charxiv/utils.py, the line q_number = indice % 4 + 1 doesn't correctly assign questions after dataset.repeat(4).
Say if I have a dataset of 1000 samples, after repeat(4), the dataset structure is:
- Indices 0-999: Copy 1
- Indices 1000-1999: Copy 2
- Indices 2000-2999: Copy 3
- Indices 3000-3999: Copy 4
However, the modulo operation gives the same result for all copies of the same example:
- Index 0: 0 % 4 = 0, q_number = 1
- Index 1000: 1000 % 4 = 0, q_number = 1 (should be 2)
- Index 2000: 2000 % 4 = 0, q_number = 1 (should be 3)
- Index 3000: 3000 % 4 = 0, q_number = 1 (should be 4)
This means all four copies use descriptive_q1, so only one of the four question types is ever evaluated.
I wanted to check if this is the intended behavior or if I'm misunderstanding the logic here?
Proposed fix (if this is indeed a problem):
Perhaps replace q_number = indice % 4 + 1 with q_number = indice // original_size + 1 where original_size is 1000 in this example case.
This correctly maps:
- Indices 0-999: q_number = 1
- Indices 1000-1999: q_number = 2
- Indices 2000-2999: q_number = 3
- Indices 3000-3999: q_number = 4
Would you please confirm the logic here? Thanks very much!