Architecture¶
Anira Web is the WebAssembly distribution of anira: the same C++
library, compiled to WASM and wrapped in a
TypeScript API. The TypeScript layer’s job is to spread that WASM
module across the browser threads anira needs in order to run
real-time inference — a main thread, the audio worklet thread, and one
or more inference worker threads.
┌──────────────────┐ ┌────────────────────┐ ┌──────────────────────┐
│ Main thread │ │ Inference worker(s)│ │ Audio worklet thread │
│ │ │ (Web Worker(s)) │ │ │
│ Setup │ │ │ │ AudioWorklet- │
│ │ │ Model inference │ │ Processor │
│ Configuration │◀──▶│ (WASM ONNX, │◀──▶│ Real-time │
│ │ │ onnxruntime-web, │ │ process() │
│ UI control │ │ or custom) │ │ Pre/Post-processor │
└──────────────────┘ └────────────────────┘ └──────────────────────┘
▲ ▲ ▲
└───────────────────────┴──────────────────────────┘
Shared WebAssembly memory
The number of inference workers is up to you. Each call to
aniraWeb.spinUpInferenceWorker() spawns a new Web Worker hosting an
InferenceThread — the same primitive anira uses for its desktop
thread pool. One worker is enough for simple models on most machines; spawn more if
you see audio dropouts, so anira can run inference on multiple batches
in parallel.
All threads share a single WebAssembly.Memory instance, so
configuration objects, ring buffers, and tensor data live at the same
heap addresses everywhere. Cross-thread coordination uses message
passing for setup and atomics on shared memory for the real-time path.
Main Thread¶
The main thread is where you set up anira. Calling
await AniraWeb.create() instantiates the WASM module and returns
the aniraWeb factory; from there you wire up the model, inference
configuration, and pre/post-processing the same way you would in C++.
The main thread also owns your UI. Non-streamable tensor values
written from here — through setInput and similar APIs — reach the
model without blocking the audio path, so a slider or toggle can
update the model from frame to frame.
Inference Worker¶
await aniraWeb.spinUpInferenceWorker() starts a Web Worker that
owns inference execution. Pulling inference off the audio thread is
what keeps the audio worklet’s process callback real-time-safe
even when a forward pass takes longer than one audio block.
The worker hosts the inference engine itself, regardless of where that
engine actually runs. Anira Web ships with two built-in engines:
ONNX Runtime compiled into the WASM module, and onnxruntime-web on
the JavaScript side (ONNXRuntimeWebBackend). User-written JS backends
also run on this worker. See Custom Inference Backends.
You can also replace the worker entry point itself:
await aniraWeb.spinUpInferenceWorker(
new URL('./customInferenceWorker.ts', import.meta.url)
)
spinUpInferenceWorker() returns an InferenceWorker handle.
When you’re done with a worker — for instance when reconfiguring or
unloading the model — call worker.stop() to halt its inference
thread, terminate the underlying Worker, and remove it from
aniraWeb.getActiveWorkers(). Workers spun up but never stopped
stay alive for the lifetime of the page.
Audio Worklet Thread¶
The browser’s AudioWorkletGlobalScope runs the audio callback. Anira
ships with a default worklet that handles the common case: a
single-tensor model with in-place stereo or mono I/O. To install it:
await aniraWeb.registerAudioWorkletForContext(audioContext)
const node = await aniraWeb.configureAudioWorklet(
audioContext,
inferenceHandler,
ppProcessor
)
For models that need more — multi-tensor I/O, a custom processing
buffer size, AudioParam integration, or a JS pre/post processor —
you provide a custom worklet file. See Custom Audio Worklets.
Note
JSPrePostProcessor subclasses are constructed on the
audio worklet thread, not on the main thread. Pre- and
post-processing run in the real-time callback, so the JS object that
implements them must live where that callback runs.
Three Customization Axes¶
Most extension work falls into one of three independent categories, each with its own page:
Custom Audio Worklets — extend
AniraAudioWorkletBasefor multi-tensor models (processMulti), custommaxBufferSize,AudioParamintegration, or to host a customJSPrePostProcessor.Custom Pre- and Post-Processing — subclass
JSPrePostProcessorto run JavaScript before and after inference (windowing, normalization, parameter clamping, etc.).Custom Inference Backends — replace the WASM-side runtime with a JavaScript backend. Built-in options (
JSBackendBase,ONNXRuntimeWebBackend) and user-written backends both run on the inference worker.
Custom pre/post processing requires a custom worklet (because the subclass must be instantiated on the audio thread); custom worklets and custom backends are otherwise independent and can be combined freely.
The JS ↔ WASM Bridge¶
C++ objects live in WASM-managed memory and are referenced by raw
numeric pointers. The TypeScript wrappers (InferenceHandler,
PrePostProcessor, BufferF, …) extend BaseWrapper,
which holds two fields per instance: ptr (the C++ pointer) and
wasmInstance (the Emscripten module). Every wrapper method
forwards into a _<class>_<method> C export with this.ptr as
the first argument.
Most wrapper APIs accept the union type
PossiblePointer<T> = T | number, so you can pass either a wrapper
instance or a raw numeric pointer. This avoids forcing an allocation
just to call into the next wrapper — when you already have a pointer
in hand (for example from a worklet message or another wrapper’s
getPointer()), pass it directly.
Helpers, all exported from the package root:
Helper |
Role |
|---|---|
|
Coerce a |
|
Return the wrapper’s underlying C++ pointer as a
number. Symmetric to |
|
Build a wrapper of class |
|
Static counterpart of |
Lifecycle and Cleanup¶
Wrapper instances expose a destroy() method that frees the
underlying C++ object via the corresponding _<class>_destroy C
export. JavaScript has no destructors, so the GC won’t call this for
you — the C++ memory only goes away when destroy() runs. For a
long-lived page that loads a model once and keeps inferring, the leak
is harmless (the module stays alive for the session anyway); for apps
that swap models, recreate handlers, or run under a test harness,
destroy() is what you call.
Not every wrapper needs destroy(), though. Whether a wrapper is
owning depends on how it was created:
A wrapper from
new SomeClass(...)runs the TS constructor, which calls_<class>_createand stashes the fresh C++ pointer. This wrapper is the only handle to that C++ object — callingdestroy()on it frees the object.A wrapper from
wrapPointerorcreateFromPointerskips the TS constructor entirely; it’s a view over a C++ object that was allocated by somebody else (typically another wrapper, or the inference worker). Don’t call ``destroy()`` on these — doing so would free a C++ object that other code is still pointing at.
For example, InferenceConfig.getTensorInputShape() returns a
TensorShapeList view; the underlying storage belongs to the
InferenceConfig, so you destroy the config, not the view.
When you do tear down a full setup, free the handler before the config and processor it references — the handler holds pointers back into them, so freeing them first leaves it with dangling references:
inferenceHandler.destroy() // first: holds refs into pp + config
ppProcessor.destroy()
inferenceConfig.destroy() // last among the three
// ProcessingSpec, VectorModelData, VectorTensorShape, etc. can
// then be destroyed in any order.