Skip to content

Conversation

@robinhad
Copy link
Contributor

@robinhad robinhad commented Apr 7, 2025

This pull request enables passing options to VLLM directly. This is useful for the following scenario:

  1. gemma-3-12b-it model, which can fit in 2x24G GPU, but has a long-context of 128k tokens. User can then specify max_model_len directly like:
    VLLM_USE_V1=0 VLLM_WORKER_MULTIPROC_METHOD=spawn lmms-eval --tasks mmmu --model vllm --model_args model_version=google/gemma-3-12b-it,tensor_parallel_size=2,gpu_memory_utilization=0.95,max_images=1,max_videos=0,max_audios=0,max_model_len=4096 --batch_size 100 --log_samples --output_path lmms-results

It enables support for all VLLM engine args.

@Luodian Luodian merged commit 6ba0b5e into EvolvingLMMs-Lab:main Apr 12, 2025
1 check passed
dadwadw233 pushed a commit to dadwadw233/lmms-eval that referenced this pull request Apr 28, 2025
* Add ability to pass options to VLLM

* Add link to engine args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants