-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
What happened?
Description:
After installing LiteLLM v1.80 via Docker image, I'm experiencing issues where max_parallel_requests and tpm parameters in the model_list.litellm_params configuration are not being enforced, while rpm works correctly.
Configuration:
router_settings:
enable_pre_call_checks: true
model_list:
- model_name: Qwen3-Coder-30B-A3B-Instruct
litellm_params:
model: hosted_vllm/Qwen3-Coder-30B-A3B-Instruct
api_base: https://xx
max_parallel_requests: 4
tpm: 160
rpm: 6
litellm_settings:
drop_params: true
ssl_verify: false
callbacks: ["prometheus"]
general_settings:
master_key: xx
store_model_in_db: true
store_prompts_in_spend_logs: true
database_url: "postgresql://xx"
Expected Behavior:
max_parallel_requests: 4 should limit concurrent requests to 4 for the Qwen3-Coder model
tpm: 160 should enforce a tokens-per-minute limit of 160
rpm: 6 should enforce 6 requests per minute (which is currently working)
Actual Behavior:
rpm: 6 is correctly enforced (requests are rate-limited to 6 per minute)
max_parallel_requests: 4 is ignored - concurrent requests exceed the limit
tpm: 160 is ignored - token usage exceeds the specified limit
Environment:
LiteLLM Version: ghcr.io/berriai/litellm:v1.80.0-stable (Docker image)
Deployment: Docker container
Database: PostgreSQL
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
1.80.0
Twitter / LinkedIn details
No response