-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed as not planned
Labels
Description
What happened?
When calling completion() with stream=True, memory usage increases with each request and does not return to the initial level. This issue does not occur with stream=False.
import os
import psutil
from litellm import completion
os.environ["OPENAI_API_KEY"] = "********"
process = psutil.Process()
initial_memory = process.memory_info().rss / (1024 * 1024)
print(f"Initial memory usage: {initial_memory:.2f} MB")
for i in range(10):
response = completion(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=True,
)
for _ in response:
pass
process = psutil.Process()
memory_usage = process.memory_info().rss / (1024 * 1024)
memory_diff = memory_usage - initial_memory
print(f"Iteration {i+1}: Memory usage {memory_usage:.2f} MB (+{memory_diff:.2f} MB)")Relevant log output
Initial memory usage: 148.52 MB
Iteration 1: Memory usage 150.30 MB (+1.79 MB)
Iteration 2: Memory usage 150.58 MB (+2.07 MB)
Iteration 3: Memory usage 150.59 MB (+2.07 MB)
Iteration 4: Memory usage 228.73 MB (+80.21 MB)
Iteration 5: Memory usage 229.10 MB (+80.59 MB)
Iteration 6: Memory usage 229.50 MB (+80.99 MB)
Iteration 7: Memory usage 230.16 MB (+81.65 MB)
Iteration 8: Memory usage 230.42 MB (+81.90 MB)
Iteration 9: Memory usage 230.97 MB (+82.45 MB)
Iteration 10: Memory usage 231.37 MB (+82.85 MB)Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.61.8