Skip to content

Misaligned question assignment in CharXiv descriptive evaluation #911

@gylia

Description

@gylia

Hi, thanks for maintaining this excellent evaluation framework! I've been using it for research and noticed an issue with the CharXiv descriptive task.

Problem:
In charxiv_descriptive_process_docs of lmms_eval/tasks/charxiv/utils.py, the line q_number = indice % 4 + 1 doesn't correctly assign questions after dataset.repeat(4).

Say if I have a dataset of 1000 samples, after repeat(4), the dataset structure is:

  • Indices 0-999: Copy 1
  • Indices 1000-1999: Copy 2
  • Indices 2000-2999: Copy 3
  • Indices 3000-3999: Copy 4

However, the modulo operation gives the same result for all copies of the same example:

  • Index 0: 0 % 4 = 0, q_number = 1
  • Index 1000: 1000 % 4 = 0, q_number = 1 (should be 2)
  • Index 2000: 2000 % 4 = 0, q_number = 1 (should be 3)
  • Index 3000: 3000 % 4 = 0, q_number = 1 (should be 4)

This means all four copies use descriptive_q1, so only one of the four question types is ever evaluated.
I wanted to check if this is the intended behavior or if I'm misunderstanding the logic here?

Proposed fix (if this is indeed a problem):
Perhaps replace q_number = indice % 4 + 1 with q_number = indice // original_size + 1 where original_size is 1000 in this example case.

This correctly maps:

  • Indices 0-999: q_number = 1
  • Indices 1000-1999: q_number = 2
  • Indices 2000-2999: q_number = 3
  • Indices 3000-3999: q_number = 4

Would you please confirm the logic here? Thanks very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions