Skip to content

[Bug][Performance] Memory leak when destroying InferenceSession - memory not released by ReleaseSession or ReleaseEnv #26831

@bgautam-ias

Description

@bgautam-ias

Describe the issue

Description

When repeatedly creating and destroying ONNX Runtime sessions (and even the environment), memory is not properly released. RSS grows continuously despite calling ReleaseSession and ReleaseEnv. This prevents long-running services from refreshing models without accumulating memory.

We create N sessions that are reused until we need to update the model, which we do by creating new sessions, doing an atomic pointer swap, and then calling destroy on the old ones. But the destroy doesn't seem to release memory.

Related Issues

open issues in this repo that have the same/similar problem:
#25325
#26827
#21673

We're using yvalue's go wrapper and the issue's been posted here too but conclusion is that it was onnxruntime itself, not the wrapper:
yalue/onnxruntime_go#114


Environment

  • ONNX Runtime Version: 1.23.2
  • OS: Linux (Amazon Linux 2)
  • Language Bindings: Go via yalue/onnxruntime_go

Model Details

  • Model size: ~50 MB ONNX file
  • Model type: Custom PyTorch model exported to ONNX (embedding layers + feed-forward network)
  • Dynamic axes: Yes (batch size and sequence length)

Here are screenshots showing memory usage in our go http server.

With destroying old sessions and creating new sessions every 30 min:

What happens at the end is there is a spike in requests to the server that caused the pod to crash and restart, so the memory resets to baseline.
Image

Keeping same sessions, no refresh
Image

To reproduce

Steps to Reproduce

  1. initialize runtime env
  2. measure RSS
  3. create N number of sessions with model (we used 32 sessions, 50MB each)
  4. measure RSS
  5. destroy sessions and create new ones with updated model
  6. measure RSS
  7. destroy runtime env

or create and destroy the runtime env between creating and destroying new sessions as well, still leaks.

I can't seem to reproduce it in my local mac, the macOS seems to be reclaiming the memory, but in production linux environment the pods crash after several hours.

Go Heap (managed by Go GC):
Iteration │ HeapAlloc (MB) │ HeapΔ (MB) │ NumGC
──────────┼────────────────┼────────────┼──────
initial │ 75 │ +0 │ 6
1 │ 75 │ +0 │ 10
2 │ 75 │ +0 │ 12
3 │ 75 │ +0 │ 14
4 │ 75 │ +0 │ 16
5 │ 75 │ +0 │ 18
6 │ 75 │ +0 │ 20
7 │ 75 │ +0 │ 22
8 │ 75 │ +0 │ 24
9 │ 75 │ +0 │ 26
10 │ 75 │ +0 │ 28
final │ 75 │ +0 │ 30

Process RSS (includes heap from ONNX Runtime)
Iteration │ RSS (MB) │ RSSΔ (MB) │ Δ/Cycle (MB)
──────────┼──────────┼───────────┼─────────────
initial │ 283 │ +0 │ +0
1 │ 3337 │ +3054 │ +3054
2 │ 3395 │ +3112 │ +1556
3 │ 3679 │ +3396 │ +1132
4 │ 3778 │ +3495 │ +873
5 │ 3835 │ +3552 │ +710
6 │ 4151 │ +3868 │ +644 -> on linux env this grows until the pod crashes
7 │ 3937 │ +3654 │ +522
8 │ 3975 │ +3692 │ +461
9 │ 3797 │ +3514 │ +390
10 │ 4032 │ +3749 │ +374
final │ 2380 │ +2097 │ +190

These issue have the same problem so you could use their code to reproduce....

#25325
#26827

What We've Tried

Configuration Result
SetCpuMemArena(true) Leaks
SetCpuMemArena(false) Leaks
SetMemPattern(true) Leaks
SetMemPattern(false) Leaks
GraphOptimizationLevelEnableAll Leaks
GraphOptimizationLevelDisableAll Leaks
Destroy sessions only Leaks
Destroy sessions + environment Still leaks
Using mimalloc (LD_PRELOAD) Still leaks
Updated to ONNX Runtime 1.23.2 Still leaks

Analysis

Looking at onnxruntime_c_api.cc:

DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Session, ::onnxruntime::InferenceSession)

This expands to delete on the InferenceSession*. The destructor ~InferenceSession() appears minimal and relies on implicit member destruction. I didn't look into it further as its too foreign to me haha. However, memory seems to not be returned to the OS.

Urgency

This seems like a pretty serious bug that's been reported multiple times and hasn't had a chance to be looked at, it would be great if we could escalate this!

Impact

  • Need to refresh ML models every 30 minutes for updated predictions in a realtime environment. Cannot restart the pods as they are constantly serving requests.
  • Currently leaking ~1.5 GB/hour with 32 sessions × 50MB model
  • Pods OOM after several hours

Workaround Attempts

None of the following work:

  1. Using custom allocators (mimalloc)
  2. Disabling all optimizations and arenas
  3. Full environment teardown and reinit when we need to reload the model

Platform

Linux

OS Version

OS: Amazon Linux 2, Kernel: 5.10.240-238.959.amzn2.x86_64

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.23.2

ONNX Runtime API

Other / Unknown

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions