Skip to content

[Bug]: max_parallel_requests and tpm parameters not working in model_list.litellm_params configuration (v1.80) #17705

@duan36

Description

@duan36

What happened?

Description:
After installing LiteLLM v1.80 via Docker image, I'm experiencing issues where max_parallel_requests and tpm parameters in the model_list.litellm_params configuration are not being enforced, while rpm works correctly.

Configuration:

router_settings:
enable_pre_call_checks: true
model_list:

  • model_name: Qwen3-Coder-30B-A3B-Instruct
    litellm_params:
    model: hosted_vllm/Qwen3-Coder-30B-A3B-Instruct
    api_base: https://xx
    max_parallel_requests: 4
    tpm: 160
    rpm: 6

litellm_settings:
drop_params: true
ssl_verify: false
callbacks: ["prometheus"]
general_settings:
master_key: xx
store_model_in_db: true
store_prompts_in_spend_logs: true
database_url: "postgresql://xx"

Expected Behavior:
max_parallel_requests: 4 should limit concurrent requests to 4 for the Qwen3-Coder model
tpm: 160 should enforce a tokens-per-minute limit of 160
rpm: 6 should enforce 6 requests per minute (which is currently working)

Actual Behavior:
rpm: 6 is correctly enforced (requests are rate-limited to 6 per minute)
max_parallel_requests: 4 is ignored - concurrent requests exceed the limit
tpm: 160 is ignored - token usage exceeds the specified limit

Environment:
LiteLLM Version: ghcr.io/berriai/litellm:v1.80.0-stable (Docker image)
Deployment: Docker container
Database: PostgreSQL

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

1.80.0

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions