Skip to content

Streamlining lm-eval Architecture #3083

@baberabb

Description

@baberabb

Motivation

As the LM Evaluation Harness has grown and evolved, we've accumulated some complexity in our codebase. While this flexibility has been valuable for supporting a wide range of use cases, it has also created several challenges:

  • Steeper learning curve: New contributors and users encounter a learning curve when getting familiar with the evaluation pipeline, task configurations, and filter mechanisms
  • Maintenance overhead: Some abstractions could be streamlined or be made more explicit.
  • Code clarity: The current codebase has grown organically, leading to some patterns that could be more intuitive and maintainable

I'm considering several potential modifications to streamline the harness architecture:

1. Filter/Metric Pipeline Restructuring

2. Task Definition Ergonomics

  • Create simplified interfaces for conventional task formats (MMLU-style multiple choice, cloze hybrid tasks, etc.) through templating systems (tracked in Standardize Task Templates #3081) and converting from multiple-choice to generation

3. Handle CI Pain Points:

3. Documentation and Discoverability

  • Make the tasks more discoverable through better organization and indexing (maybe through some hierarchical grouping)
  • Provide clearer documentation for how the pieces fit together
  • Improve examples and onboarding materials

Feedback

I'd love to hear from the community about:

  • Which areas of complexity have been most challenging in your experience?
  • What aspects of the current architecture work well and should be preserved?
  • Any specific pain points or use cases that should be prioritized?
  • Suggestions for maintaining backward compatibility?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions