Skip to content

[Bug]: Memory Leak in completion() with stream=True #8620

@iwamot

Description

@iwamot

What happened?

When calling completion() with stream=True, memory usage increases with each request and does not return to the initial level. This issue does not occur with stream=False.

import os
import psutil
from litellm import completion

os.environ["OPENAI_API_KEY"] = "********"

process = psutil.Process()
initial_memory = process.memory_info().rss / (1024 * 1024)
print(f"Initial memory usage: {initial_memory:.2f} MB")

for i in range(10):
    response = completion(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello!"}
        ],
        stream=True,
    )
    for _ in response:
        pass

    process = psutil.Process()
    memory_usage = process.memory_info().rss / (1024 * 1024)
    memory_diff = memory_usage - initial_memory
    print(f"Iteration {i+1}: Memory usage {memory_usage:.2f} MB (+{memory_diff:.2f} MB)")

Relevant log output

Initial memory usage: 148.52 MB
Iteration 1: Memory usage 150.30 MB (+1.79 MB)
Iteration 2: Memory usage 150.58 MB (+2.07 MB)
Iteration 3: Memory usage 150.59 MB (+2.07 MB)
Iteration 4: Memory usage 228.73 MB (+80.21 MB)
Iteration 5: Memory usage 229.10 MB (+80.59 MB)
Iteration 6: Memory usage 229.50 MB (+80.99 MB)
Iteration 7: Memory usage 230.16 MB (+81.65 MB)
Iteration 8: Memory usage 230.42 MB (+81.90 MB)
Iteration 9: Memory usage 230.97 MB (+82.45 MB)
Iteration 10: Memory usage 231.37 MB (+82.85 MB)

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.61.8

Twitter / LinkedIn details

@iwamot / https://www.linkedin.com/in/iwamot/

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions