Class anira::InferenceManager

class InferenceManager

Collaboration diagram for anira::InferenceManager:

digraph {
    graph [bgcolor="#00000000"]
    node [shape=rectangle style=filled fillcolor="#FFFFFF" font=Helvetica padding=2]
    edge [color="#1414CE"]
    "15" [label="anira::Buffer< float >" tooltip="anira::Buffer< float >"]
    "16" [label="anira::MemoryBlock< float >" tooltip="anira::MemoryBlock< float >"]
    "7" [label="anira::MemoryBlock< std::atomic< float > >" tooltip="anira::MemoryBlock< std::atomic< float > >"]
    "12" [label="anira::BackendBase" tooltip="anira::BackendBase"]
    "9" [label="anira::Context" tooltip="anira::Context"]
    "10" [label="anira::ContextConfig" tooltip="anira::ContextConfig"]
    "24" [label="anira::HighPriorityThread" tooltip="anira::HighPriorityThread"]
    "8" [label="anira::HostConfig" tooltip="anira::HostConfig"]
    "2" [label="anira::InferenceConfig" tooltip="anira::InferenceConfig"]
    "25" [label="anira::InferenceData" tooltip="anira::InferenceData"]
    "1" [label="anira::InferenceManager" tooltip="anira::InferenceManager" fillcolor="#BFBFBF"]
    "23" [label="anira::InferenceThread" tooltip="anira::InferenceThread"]
    "17" [label="anira::LibtorchProcessor" tooltip="anira::LibtorchProcessor"]
    "18" [label="anira::LibtorchProcessor::Instance" tooltip="anira::LibtorchProcessor::Instance"]
    "4" [label="anira::ModelData" tooltip="anira::ModelData"]
    "19" [label="anira::OnnxRuntimeProcessor" tooltip="anira::OnnxRuntimeProcessor"]
    "20" [label="anira::OnnxRuntimeProcessor::Instance" tooltip="anira::OnnxRuntimeProcessor::Instance"]
    "6" [label="anira::PrePostProcessor" tooltip="anira::PrePostProcessor"]
    "3" [label="anira::ProcessingSpec" tooltip="anira::ProcessingSpec"]
    "14" [label="anira::RingBuffer" tooltip="anira::RingBuffer"]
    "11" [label="anira::SessionElement" tooltip="anira::SessionElement"]
    "13" [label="anira::SessionElement::ThreadSafeStruct" tooltip="anira::SessionElement::ThreadSafeStruct"]
    "21" [label="anira::TFLiteProcessor" tooltip="anira::TFLiteProcessor"]
    "22" [label="anira::TFLiteProcessor::Instance" tooltip="anira::TFLiteProcessor::Instance"]
    "5" [label="anira::TensorShape" tooltip="anira::TensorShape"]
    "15" -> "16" [dir=forward tooltip="usage"]
    "12" -> "2" [dir=forward tooltip="usage"]
    "9" -> "10" [dir=forward tooltip="usage"]
    "9" -> "11" [dir=forward tooltip="usage"]
    "9" -> "23" [dir=forward tooltip="usage"]
    "9" -> "17" [dir=forward tooltip="usage"]
    "9" -> "19" [dir=forward tooltip="usage"]
    "9" -> "21" [dir=forward tooltip="usage"]
    "9" -> "25" [dir=forward tooltip="usage"]
    "2" -> "3" [dir=forward tooltip="usage"]
    "2" -> "4" [dir=forward tooltip="usage"]
    "2" -> "5" [dir=forward tooltip="usage"]
    "1" -> "2" [dir=forward tooltip="usage"]
    "1" -> "6" [dir=forward tooltip="usage"]
    "1" -> "8" [dir=forward tooltip="usage"]
    "1" -> "9" [dir=forward tooltip="usage"]
    "1" -> "11" [dir=forward tooltip="usage"]
    "23" -> "24" [dir=forward tooltip="public-inheritance"]
    "23" -> "25" [dir=forward tooltip="usage"]
    "17" -> "12" [dir=forward tooltip="public-inheritance"]
    "17" -> "18" [dir=forward tooltip="usage"]
    "18" -> "2" [dir=forward tooltip="usage"]
    "18" -> "16" [dir=forward tooltip="usage"]
    "19" -> "12" [dir=forward tooltip="public-inheritance"]
    "19" -> "20" [dir=forward tooltip="usage"]
    "20" -> "2" [dir=forward tooltip="usage"]
    "20" -> "16" [dir=forward tooltip="usage"]
    "6" -> "2" [dir=forward tooltip="usage"]
    "6" -> "7" [dir=forward tooltip="usage"]
    "14" -> "15" [dir=forward tooltip="public-inheritance"]
    "11" -> "6" [dir=forward tooltip="usage"]
    "11" -> "2" [dir=forward tooltip="usage"]
    "11" -> "12" [dir=forward tooltip="usage"]
    "11" -> "8" [dir=forward tooltip="usage"]
    "11" -> "13" [dir=forward tooltip="usage"]
    "11" -> "14" [dir=forward tooltip="usage"]
    "11" -> "17" [dir=forward tooltip="usage"]
    "11" -> "19" [dir=forward tooltip="usage"]
    "11" -> "21" [dir=forward tooltip="usage"]
    "21" -> "12" [dir=forward tooltip="public-inheritance"]
    "21" -> "22" [dir=forward tooltip="usage"]
    "22" -> "2" [dir=forward tooltip="usage"]
    "22" -> "16" [dir=forward tooltip="usage"]
}

Central manager class for coordinating neural network inference operations.

The InferenceManager class serves as the primary coordinator for neural network inference in real-time audio processing applications. It manages the complete inference pipeline including input preprocessing, backend execution scheduling, output postprocessing, and session management with multiple inference threads.

Key responsibilities:

  • Managing inference sessions and thread coordination

  • Handling input/output data flow and buffering

  • Coordinating with PrePostProcessor for data transformation

  • Managing latency compensation and sample counting

  • Providing thread-safe access to inference operations

  • Supporting both real-time and non-real-time processing modes

The manager supports multiple processing patterns:

  • Synchronous processing with immediate input/output

  • Asynchronous push/pop processing for decoupled operation

  • Multi-tensor processing for complex model architectures

  • Custom latency handling for different model types

Note

This class coordinates between multiple components and should be used as the primary interface for inference operations rather than directly accessing lower-level components.

Public Functions

InferenceManager() = delete

Default constructor is deleted to prevent uninitialized instances.

InferenceManager(PrePostProcessor &pp_processor, InferenceConfig &inference_config, BackendBase *custom_processor, const ContextConfig &context_config)

Constructor that initializes the inference manager with all required components.

Creates an inference manager with the specified preprocessing/postprocessing pipeline, inference configuration, and optional custom backend. Initializes the context and prepares for session management.

Parameters:
  • pp_processor – Reference to the preprocessing/postprocessing pipeline

  • inference_config – Reference to the inference configuration containing model settings

  • custom_processor – Pointer to a custom backend processor (can be nullptr for default backends)

  • context_config – Configuration for the inference context and thread management

~InferenceManager()

Destructor that properly cleans up inference resources.

Ensures proper shutdown of inference threads, cleanup of sessions, and release of all managed resources.

void prepare(HostConfig config, std::vector<long> custom_latency = {})

Prepares the inference manager for processing with new audio configuration.

Initializes the inference pipeline with the specified host configuration and optional custom latency settings. This method must be called before processing begins or when audio settings change.

Parameters:
  • config – Host configuration containing sample rate, buffer size, and audio settings

  • custom_latency – Optional vector of custom latency values for each tensor (empty for automatic calculation)

size_t *process(const float *const *const *input_data, size_t *num_input_samples, float *const *const *output_data, size_t *num_output_samples)

Processes multi-tensor audio data with separate input and output buffers.

Performs complete inference processing for multiple tensors simultaneously, handling preprocessing, inference execution, and postprocessing. This method supports complex model architectures with multiple inputs and outputs.

Note

This method is real-time safe and should not allocate memory

Parameters:
  • input_data – Input data organized as data[tensor_index][channel][sample]

  • num_input_samples – Array of input sample counts for each tensor

  • output_data – Output data buffers organized as data[tensor_index][channel][sample]

  • num_output_samples – Array of maximum output sample counts for each tensor

Returns:

Array of actual output sample counts for each tensor

void push_data(const float *const *const *input_data, size_t *num_input_samples)

Pushes input data to the inference pipeline for asynchronous processing.

Queues input data for processing without waiting for results. This enables decoupled input/output processing where data can be pushed and popped independently for buffered processing scenarios.

Note

This method is real-time safe and should not allocate memory

Parameters:
  • input_data – Input data organized as data[tensor_index][channel][sample]

  • num_input_samples – Array of input sample counts for each tensor

size_t *pop_data(float *const *const *output_data, size_t *num_output_samples)

Pops processed output data from the inference pipeline.

Retrieves processed data from the inference pipeline. Should be used in conjunction with push_data for decoupled processing patterns.

Note

This method is real-time safe and should not allocate memory

Parameters:
  • output_data – Output buffers organized as data[tensor_index][channel][sample]

  • num_output_samples – Array of maximum output sample counts for each tensor

Returns:

Array of actual output sample counts for each tensor

void set_backend(InferenceBackend new_inference_backend)

Sets the inference backend to use for neural network processing.

Changes the active inference backend, which may trigger session reinitialization if the new backend differs from the current one.

Parameters:

new_inference_backend – The backend type to use (ONNX, LibTorch, TensorFlow Lite, or Custom)

InferenceBackend get_backend() const

Gets the currently active inference backend.

Returns:

The currently configured inference backend type

std::vector<unsigned int> get_latency() const

Gets the processing latency for all tensors.

Returns the latency introduced by the inference processing in samples for each tensor. This includes buffering delays, preprocessing/postprocessing latency, and model-specific processing latency.

Returns:

Vector containing latency values in samples for each tensor index

size_t get_available_samples(size_t tensor_index, size_t channel) const

Gets the number of samples received for a specific tensor and channel (for unit testing)

This method is primarily used for unit testing and debugging purposes to monitor the data flow through the inference pipeline.

Parameters:
  • tensor_index – Index of the tensor to query

  • channel – Channel index to query

Returns:

Number of samples received for the specified tensor and channel

const Context &get_context() const

Gets a const reference to the inference context (for unit testing)

Provides access to the internal inference context for testing and debugging. This method should primarily be used for unit testing purposes.

Returns:

Const reference to the internal Context object

int get_session_id() const

Gets the current session ID.

Returns the unique identifier for the current inference session. This can be useful for debugging and session tracking purposes.

Returns:

The current session ID

void set_non_realtime(bool is_non_realtime) const

Configures the manager for non-real-time operation.

When set to true, relaxes real-time constraints and may use different processing algorithms or memory allocation strategies optimized for offline processing rather than real-time audio.

Parameters:

is_non_realtime – True to enable non-real-time mode, false for real-time mode

void reset()

Resets the inference session to its initial state.

This method clears all internal buffers, resets the inference pipeline, and prepares the handler for a new processing session.

Note

This method waits for all ongoing inferences to complete before resetting.