-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Description
I'm trying to load a custom config file by passing its path as the tasks argument, according to the description in New Task Guide, but lm_eval cannot find the task file even if the given path is correct.
$ ls -al
total 12
drwxrwxr-x 2 zbw zbw 4096 Nov 24 06:53 .
drwxr-x--- 23 zbw zbw 4096 Nov 24 07:27 ..
-rw-rw-r-- 1 zbw zbw 1037 Nov 24 06:53 gsm8k.yaml
$ lm-eval --tasks ./gsm8k.yaml
2025-11-24:07:24:55 ERROR [__main__:419] Tasks were not found: ./gsm8k.yaml
Try `lm-eval --tasks list` for list of available tasks
Traceback (most recent call last):
File "/data/zbw/miniconda3/envs/dllm/bin/lm-eval", line 7, in <module>
sys.exit(cli_evaluate())
^^^^^^^^^^^^^^
File "/data/zbw/miniconda3/envs/dllm/lib/python3.12/site-packages/lm_eval/__main__.py", line 423, in cli_evaluate
raise ValueError(
ValueError: Tasks not found: ./gsm8k.yaml. Try `lm-eval --tasks {list_groups,list_subtasks,list_tags,list}` to list out all available names for task groupings; only (sub)tasks; tags; or all of the above, or pass '--verbosity DEBUG' to troubleshoot task registration issues.However, if the path of its parent directory (in the above case it's ~/demo or .) is passed, lm_eval can load the custom task.
$ lm-eval --tasks .
2025-11-24:07:45:35 INFO [__main__:446] Selected Tasks: [{'tag': ['math_word_problems'], 'task': 'gsm8k', 'dataset_path': '/data/zbw/szj/datasets/openai/gsm8k', 'dataset_name': 'main', 'output_type': 'generate_until', 'training_split': 'train', 'fewshot_split': 'train', 'test_split': 'test', 'doc_to_text': 'Question: {{question}}\nAnswer:', 'doc_to_target': '{{answer}}', 'metric_list': [{'metric': 'exact_match', 'aggregation': 'mean', 'higher_is_better': True, 'ignore_case': True, 'ignore_punctuation': False, 'regexes_to_ignore': [',', '\\$', '(?s).*#### ', '\\.$']}], 'generation_kwargs': {'until': ['Question:', '</s>', '<|im_end|>'], 'do_sample': False, 'temperature': 0.0}, 'repeats': 1, 'num_fewshot': 5, 'filter_list': [{'name': 'strict-match', 'filter': [{'function': 'regex', 'regex_pattern': '#### (\\-?[0-9\\.\\,]+)'}, {'function': 'take_first'}]}, {'name': 'flexible-extract', 'filter': [{'function': 'regex', 'group_select': -1, 'regex_pattern': '(-?[$0-9.,]{2,})|(-?[0-9]+)'}, {'function': 'take_first'}]}], 'metadata': {'version': 3.0}}]
# skip unrelevant following outputFollowing the error stack trace, the problem lies in the check for missing tasks, where lm_eval will try to match a dict with a str and eventually fail in the case of loading an external file.
lm-evaluation-harness/lm_eval/__main__.py
Lines 409 to 415 in 7ddb2b1
| for task in [task for task in task_list if task not in task_names]: | |
| if os.path.isfile(task): | |
| config = utils.load_yaml_config(task) | |
| task_names.append(config) | |
| task_missing = [ | |
| task for task in task_list if task not in task_names and "*" not in task | |
| ] # we don't want errors if a wildcard ("*") task name was used |
Metadata
Metadata
Assignees
Labels
No labels