-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the issue
InferenceSession.create() hangs indefinitely when loading ONNX models with external data files (.onnx_data) and multi-threading enabled (numThreads > 1). The hang occurs consistently with no error, timeout, or console output. The same models load successfully with numThreads = 1.
Environment:
- Browser: Chrome/Firefox on Linux
- Cross-origin isolation: Enabled (crossOriginIsolated: true, SharedArrayBuffer available)
- Model: ResembleAI/chatterbox-turbo-ONNX (4 models with external data: 230MB-1GB each)
Configuration that hangs:
- ort.env.wasm.numThreads = 12
- ort.env.wasm.simd = true
- Session created with ArrayBuffer + externalData configuration
Configuration that works:
- ort.env.wasm.numThreads = 1 (all other settings identical)
We've tested: ArrayBuffer with external data, SharedArrayBuffer, URL-based loading (fails with "Module.MountedFiles not available"), various session options (graphOptimizationLevel, explicit thread counts), and multiple model quantizations (FP32, FP16, Q4, Q8). All configurations hang with multi-threading.
Note: This issue was generated with an LLM agent guided by a human after an extensive debugging session. I will follow-up with a demo repo if no quick solutions to attempt materialize.
To reproduce
import * as ort from 'onnxruntime-web';
// Configure WASM
ort.env.wasm.wasmPaths = '/ort-wasm/';
ort.env.wasm.simd = true;
ort.env.wasm.numThreads = 12; // Hangs with >1, works with =1
// Fetch model and external data
const modelResponse = await fetch('/models/speech_encoder.onnx');
const modelBuffer = await modelResponse.arrayBuffer();
const dataResponse = await fetch('/models/speech_encoder.onnx_data');
const dataBuffer = await dataResponse.arrayBuffer();
const dataBytes = new Uint8Array(dataBuffer);
// Create session
const session = await ort.InferenceSession.create(modelBuffer, {
executionProviders: ['wasm'],
externalData: [
{ data: dataBytes, path: 'speech_encoder.onnx_data' },
{ data: dataBytes, path: './speech_encoder.onnx_data' }
]
});
// ❌ Hangs here indefinitely when numThreads > 1
// ✅ Loads successfully when numThreads = 1Urgency
Medium - Workaround exists (single-threading) but significantly impacts performance. Multi-threading is seemingly essential for acceptable TTS generation speed of the model I am attempting: https://huggingface.co/ResembleAI/chatterbox-turbo-ONNX
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2 (onnxruntime-web from npm)
Execution Provider
'wasm'/'cpu' (WebAssembly CPU)