Skip to content

[Web] Session creation hangs indefinitely with multi-threading when using external data files in onnxruntime-web #26858

@bitnom

Description

@bitnom

Describe the issue

InferenceSession.create() hangs indefinitely when loading ONNX models with external data files (.onnx_data) and multi-threading enabled (numThreads > 1). The hang occurs consistently with no error, timeout, or console output. The same models load successfully with numThreads = 1.

Environment:

  • Browser: Chrome/Firefox on Linux
  • Cross-origin isolation: Enabled (crossOriginIsolated: true, SharedArrayBuffer available)
  • Model: ResembleAI/chatterbox-turbo-ONNX (4 models with external data: 230MB-1GB each)

Configuration that hangs:

  • ort.env.wasm.numThreads = 12
  • ort.env.wasm.simd = true
  • Session created with ArrayBuffer + externalData configuration

Configuration that works:

  • ort.env.wasm.numThreads = 1 (all other settings identical)

We've tested: ArrayBuffer with external data, SharedArrayBuffer, URL-based loading (fails with "Module.MountedFiles not available"), various session options (graphOptimizationLevel, explicit thread counts), and multiple model quantizations (FP32, FP16, Q4, Q8). All configurations hang with multi-threading.


Note: This issue was generated with an LLM agent guided by a human after an extensive debugging session. I will follow-up with a demo repo if no quick solutions to attempt materialize.

To reproduce

import * as ort from 'onnxruntime-web';

  // Configure WASM
  ort.env.wasm.wasmPaths = '/ort-wasm/';
  ort.env.wasm.simd = true;
  ort.env.wasm.numThreads = 12; // Hangs with >1, works with =1

  // Fetch model and external data
  const modelResponse = await fetch('/models/speech_encoder.onnx');
  const modelBuffer = await modelResponse.arrayBuffer();

  const dataResponse = await fetch('/models/speech_encoder.onnx_data');
  const dataBuffer = await dataResponse.arrayBuffer();
  const dataBytes = new Uint8Array(dataBuffer);

  // Create session
  const session = await ort.InferenceSession.create(modelBuffer, {
    executionProviders: ['wasm'],
    externalData: [
      { data: dataBytes, path: 'speech_encoder.onnx_data' },
      { data: dataBytes, path: './speech_encoder.onnx_data' }
    ]
  });
  // ❌ Hangs here indefinitely when numThreads > 1
  // ✅ Loads successfully when numThreads = 1

Urgency

Medium - Workaround exists (single-threading) but significantly impacts performance. Multi-threading is seemingly essential for acceptable TTS generation speed of the model I am attempting: https://huggingface.co/ResembleAI/chatterbox-turbo-ONNX

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2 (onnxruntime-web from npm)

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:Javascriptissues related to the Javascript APIplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions