[Bug][Performance] Memory leak when destroying InferenceSession - memory not released by ReleaseSession or ReleaseEnv

### Describe the issue


### Description

When repeatedly creating and destroying ONNX Runtime sessions (and even the environment), memory is not properly released. RSS grows continuously despite calling `ReleaseSession` and `ReleaseEnv`. This prevents long-running services from refreshing models without accumulating memory.

We create N sessions that are reused until we need to update the model, which we do by creating new sessions, doing an atomic pointer swap, and then calling destroy on the old ones. But the destroy doesn't seem to release memory.

### Related Issues

open issues in this repo that have the same/similar problem:
https://github.com/microsoft/onnxruntime/issues/25325
https://github.com/microsoft/onnxruntime/issues/26827
https://github.com/microsoft/onnxruntime/issues/21673

We're using yvalue's go wrapper and the issue's been posted here too but conclusion is that it was onnxruntime itself, not the wrapper:
https://github.com/yalue/onnxruntime_go/issues/114

---

### Environment

- **ONNX Runtime Version:** 1.23.2
- **OS:** Linux (Amazon Linux 2)
- **Language Bindings:** Go via [yalue/onnxruntime_go](https://github.com/yalue/onnxruntime_go)

### Model Details

- **Model size:** ~50 MB ONNX file
- **Model type:** Custom PyTorch model exported to ONNX (embedding layers + feed-forward network)
- **Dynamic axes:** Yes (batch size and sequence length)


Here are screenshots showing memory usage in our go http server.

**With destroying old sessions and creating new sessions every 30 min:** 

What happens at the end is there is a spike in requests to the server that caused the pod to crash and restart, so the memory resets to baseline.
<img width="896" height="375" alt="Image" src="https://github.com/user-attachments/assets/73f5504b-a306-49c6-b650-0b90dbd8ed70" />

**Keeping same sessions, no refresh**
<img width="845" height="391" alt="Image" src="https://github.com/user-attachments/assets/921906e3-1ca3-415c-97ad-0d0f2abe1f7e" />

### To reproduce

### Steps to Reproduce

1. initialize runtime env
2. measure RSS
3. create N number of sessions with model (we used 32 sessions, 50MB each)
4. measure RSS
5. destroy sessions and create new ones with updated model
6. measure RSS
7. destroy runtime env

or create and destroy the runtime env between creating and destroying new sessions as well, still leaks.

I can't seem to reproduce it in my local mac, the macOS seems to be reclaiming the memory, but in production linux environment the pods crash after several hours.


Go Heap (managed by Go GC):
Iteration │ HeapAlloc (MB) │ HeapΔ (MB) │ NumGC
──────────┼────────────────┼────────────┼──────
initial   │             75 │         +0 │     6
1         │             75 │         +0 │    10
2         │             75 │         +0 │    12
3         │             75 │         +0 │    14
4         │             75 │         +0 │    16
5         │             75 │         +0 │    18
6         │             75 │         +0 │    20
7         │             75 │         +0 │    22
8         │             75 │         +0 │    24
9         │             75 │         +0 │    26
10        │             75 │         +0 │    28
final     │             75 │         +0 │    30

Process RSS (includes heap from ONNX Runtime) 
Iteration │ RSS (MB) │ RSSΔ (MB) │ Δ/Cycle (MB)
──────────┼──────────┼───────────┼─────────────
initial   │      283 │        +0 │           +0
1         │     3337 │     +3054 │        +3054
2         │     3395 │     +3112 │        +1556
3         │     3679 │     +3396 │        +1132
4         │     3778 │     +3495 │         +873
5         │     3835 │     +3552 │         +710
6         │     4151 │     +3868 │         +644 -> on linux env this grows until the pod crashes
7         │     3937 │     +3654 │         +522
8         │     3975 │     +3692 │         +461
9         │     3797 │     +3514 │         +390
10        │     4032 │     +3749 │         +374
final     │     2380 │     +2097 │         +190



These issue have the same problem so you could use their code to reproduce....

https://github.com/microsoft/onnxruntime/issues/25325
https://github.com/microsoft/onnxruntime/issues/26827


### What We've Tried

| Configuration | Result |
|---------------|--------|
| `SetCpuMemArena(true)` | Leaks |
| `SetCpuMemArena(false)` | Leaks |
| `SetMemPattern(true)` | Leaks |
| `SetMemPattern(false)` | Leaks |
| `GraphOptimizationLevelEnableAll` | Leaks |
| `GraphOptimizationLevelDisableAll` | Leaks |
| Destroy sessions only | Leaks |
| Destroy sessions + environment | Still leaks |
| Using mimalloc (`LD_PRELOAD`) | Still leaks |
| Updated to ONNX Runtime 1.23.2 | Still leaks |

### Analysis

Looking at `onnxruntime_c_api.cc`:

```cpp
DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Session, ::onnxruntime::InferenceSession)
```

This expands to `delete` on the `InferenceSession*`. The destructor `~InferenceSession()` appears minimal and relies on implicit member destruction. I didn't look into it further as its too foreign to me haha. However, memory seems to not be returned to the OS.

### Urgency

This seems like a pretty serious bug that's been reported multiple times and hasn't had a chance to be looked at, it would be great if we could escalate this!


### Impact

- Need to refresh ML models every 30 minutes for updated predictions in a realtime environment. Cannot restart the pods as they are constantly serving requests.
- Currently leaking ~1.5 GB/hour with 32 sessions × 50MB model
- Pods OOM after several hours

### Workaround Attempts

None of the following work:
1. Using custom allocators (mimalloc)
2. Disabling all optimizations and arenas
3. Full environment teardown and reinit when we need to reload the model

### Platform

Linux

### OS Version

OS: Amazon Linux 2, Kernel: 5.10.240-238.959.amzn2.x86_64

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.23.2

### ONNX Runtime API

Other / Unknown

### Architecture

X86

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][Performance] Memory leak when destroying InferenceSession - memory not released by ReleaseSession or ReleaseEnv #26831

Describe the issue

Description

Related Issues

Environment

Model Details

To reproduce

Steps to Reproduce

What We've Tried

Analysis

Urgency

Impact

Workaround Attempts

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configuration	Result
`SetCpuMemArena(true)`	Leaks
`SetCpuMemArena(false)`	Leaks
`SetMemPattern(true)`	Leaks
`SetMemPattern(false)`	Leaks
`GraphOptimizationLevelEnableAll`	Leaks
`GraphOptimizationLevelDisableAll`	Leaks
Destroy sessions only	Leaks
Destroy sessions + environment	Still leaks
Using mimalloc (`LD_PRELOAD`)	Still leaks
Updated to ONNX Runtime 1.23.2	Still leaks

[Bug][Performance] Memory leak when destroying InferenceSession - memory not released by ReleaseSession or ReleaseEnv #26831

Description

Describe the issue

Description

Related Issues

Environment

Model Details

To reproduce

Steps to Reproduce

What We've Tried

Analysis

Urgency

Impact

Workaround Attempts

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions