-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the issue
Description
When repeatedly creating and destroying ONNX Runtime sessions (and even the environment), memory is not properly released. RSS grows continuously despite calling ReleaseSession and ReleaseEnv. This prevents long-running services from refreshing models without accumulating memory.
We create N sessions that are reused until we need to update the model, which we do by creating new sessions, doing an atomic pointer swap, and then calling destroy on the old ones. But the destroy doesn't seem to release memory.
Related Issues
open issues in this repo that have the same/similar problem:
#25325
#26827
#21673
We're using yvalue's go wrapper and the issue's been posted here too but conclusion is that it was onnxruntime itself, not the wrapper:
yalue/onnxruntime_go#114
Environment
- ONNX Runtime Version: 1.23.2
- OS: Linux (Amazon Linux 2)
- Language Bindings: Go via yalue/onnxruntime_go
Model Details
- Model size: ~50 MB ONNX file
- Model type: Custom PyTorch model exported to ONNX (embedding layers + feed-forward network)
- Dynamic axes: Yes (batch size and sequence length)
Here are screenshots showing memory usage in our go http server.
With destroying old sessions and creating new sessions every 30 min:
What happens at the end is there is a spike in requests to the server that caused the pod to crash and restart, so the memory resets to baseline.

Keeping same sessions, no refresh

To reproduce
Steps to Reproduce
- initialize runtime env
- measure RSS
- create N number of sessions with model (we used 32 sessions, 50MB each)
- measure RSS
- destroy sessions and create new ones with updated model
- measure RSS
- destroy runtime env
or create and destroy the runtime env between creating and destroying new sessions as well, still leaks.
I can't seem to reproduce it in my local mac, the macOS seems to be reclaiming the memory, but in production linux environment the pods crash after several hours.
Go Heap (managed by Go GC):
Iteration │ HeapAlloc (MB) │ HeapΔ (MB) │ NumGC
──────────┼────────────────┼────────────┼──────
initial │ 75 │ +0 │ 6
1 │ 75 │ +0 │ 10
2 │ 75 │ +0 │ 12
3 │ 75 │ +0 │ 14
4 │ 75 │ +0 │ 16
5 │ 75 │ +0 │ 18
6 │ 75 │ +0 │ 20
7 │ 75 │ +0 │ 22
8 │ 75 │ +0 │ 24
9 │ 75 │ +0 │ 26
10 │ 75 │ +0 │ 28
final │ 75 │ +0 │ 30
Process RSS (includes heap from ONNX Runtime)
Iteration │ RSS (MB) │ RSSΔ (MB) │ Δ/Cycle (MB)
──────────┼──────────┼───────────┼─────────────
initial │ 283 │ +0 │ +0
1 │ 3337 │ +3054 │ +3054
2 │ 3395 │ +3112 │ +1556
3 │ 3679 │ +3396 │ +1132
4 │ 3778 │ +3495 │ +873
5 │ 3835 │ +3552 │ +710
6 │ 4151 │ +3868 │ +644 -> on linux env this grows until the pod crashes
7 │ 3937 │ +3654 │ +522
8 │ 3975 │ +3692 │ +461
9 │ 3797 │ +3514 │ +390
10 │ 4032 │ +3749 │ +374
final │ 2380 │ +2097 │ +190
These issue have the same problem so you could use their code to reproduce....
What We've Tried
| Configuration | Result |
|---|---|
SetCpuMemArena(true) |
Leaks |
SetCpuMemArena(false) |
Leaks |
SetMemPattern(true) |
Leaks |
SetMemPattern(false) |
Leaks |
GraphOptimizationLevelEnableAll |
Leaks |
GraphOptimizationLevelDisableAll |
Leaks |
| Destroy sessions only | Leaks |
| Destroy sessions + environment | Still leaks |
Using mimalloc (LD_PRELOAD) |
Still leaks |
| Updated to ONNX Runtime 1.23.2 | Still leaks |
Analysis
Looking at onnxruntime_c_api.cc:
DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Session, ::onnxruntime::InferenceSession)This expands to delete on the InferenceSession*. The destructor ~InferenceSession() appears minimal and relies on implicit member destruction. I didn't look into it further as its too foreign to me haha. However, memory seems to not be returned to the OS.
Urgency
This seems like a pretty serious bug that's been reported multiple times and hasn't had a chance to be looked at, it would be great if we could escalate this!
Impact
- Need to refresh ML models every 30 minutes for updated predictions in a realtime environment. Cannot restart the pods as they are constantly serving requests.
- Currently leaking ~1.5 GB/hour with 32 sessions × 50MB model
- Pods OOM after several hours
Workaround Attempts
None of the following work:
- Using custom allocators (mimalloc)
- Disabling all optimizations and arenas
- Full environment teardown and reinit when we need to reload the model
Platform
Linux
OS Version
OS: Amazon Linux 2, Kernel: 5.10.240-238.959.amzn2.x86_64
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.23.2
ONNX Runtime API
Other / Unknown
Architecture
X86
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No