Skip to content

Cannot load custom task config file via its path #3424

@SkyR0ver

Description

@SkyR0ver

I'm trying to load a custom config file by passing its path as the tasks argument, according to the description in New Task Guide, but lm_eval cannot find the task file even if the given path is correct.

$ ls -al                      
total 12
drwxrwxr-x  2 zbw zbw 4096 Nov 24 06:53 .
drwxr-x--- 23 zbw zbw 4096 Nov 24 07:27 ..
-rw-rw-r--  1 zbw zbw 1037 Nov 24 06:53 gsm8k.yaml
$ lm-eval --tasks ./gsm8k.yaml
2025-11-24:07:24:55 ERROR    [__main__:419] Tasks were not found: ./gsm8k.yaml
                                               Try `lm-eval --tasks list` for list of available tasks
Traceback (most recent call last):
  File "/data/zbw/miniconda3/envs/dllm/bin/lm-eval", line 7, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/data/zbw/miniconda3/envs/dllm/lib/python3.12/site-packages/lm_eval/__main__.py", line 423, in cli_evaluate
    raise ValueError(
ValueError: Tasks not found: ./gsm8k.yaml. Try `lm-eval --tasks {list_groups,list_subtasks,list_tags,list}` to list out all available names for task groupings; only (sub)tasks; tags; or all of the above, or pass '--verbosity DEBUG' to troubleshoot task registration issues.

However, if the path of its parent directory (in the above case it's ~/demo or .) is passed, lm_eval can load the custom task.

$ lm-eval --tasks .           
2025-11-24:07:45:35 INFO     [__main__:446] Selected Tasks: [{'tag': ['math_word_problems'], 'task': 'gsm8k', 'dataset_path': '/data/zbw/szj/datasets/openai/gsm8k', 'dataset_name': 'main', 'output_type': 'generate_until', 'training_split': 'train', 'fewshot_split': 'train', 'test_split': 'test', 'doc_to_text': 'Question: {{question}}\nAnswer:', 'doc_to_target': '{{answer}}', 'metric_list': [{'metric': 'exact_match', 'aggregation': 'mean', 'higher_is_better': True, 'ignore_case': True, 'ignore_punctuation': False, 'regexes_to_ignore': [',', '\\$', '(?s).*#### ', '\\.$']}], 'generation_kwargs': {'until': ['Question:', '</s>', '<|im_end|>'], 'do_sample': False, 'temperature': 0.0}, 'repeats': 1, 'num_fewshot': 5, 'filter_list': [{'name': 'strict-match', 'filter': [{'function': 'regex', 'regex_pattern': '#### (\\-?[0-9\\.\\,]+)'}, {'function': 'take_first'}]}, {'name': 'flexible-extract', 'filter': [{'function': 'regex', 'group_select': -1, 'regex_pattern': '(-?[$0-9.,]{2,})|(-?[0-9]+)'}, {'function': 'take_first'}]}], 'metadata': {'version': 3.0}}]
# skip unrelevant following output

Following the error stack trace, the problem lies in the check for missing tasks, where lm_eval will try to match a dict with a str and eventually fail in the case of loading an external file.

for task in [task for task in task_list if task not in task_names]:
if os.path.isfile(task):
config = utils.load_yaml_config(task)
task_names.append(config)
task_missing = [
task for task in task_list if task not in task_names and "*" not in task
] # we don't want errors if a wildcard ("*") task name was used

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions