Class anira::InferenceManager¶
-
class InferenceManager¶
Collaboration diagram for anira::InferenceManager:
Central manager class for coordinating neural network inference operations.
The InferenceManager class serves as the primary coordinator for neural network inference in real-time audio processing applications. It manages the complete inference pipeline including input preprocessing, backend execution scheduling, output postprocessing, and session management with multiple inference threads.
Key responsibilities:
Managing inference sessions and thread coordination
Handling input/output data flow and buffering
Coordinating with PrePostProcessor for data transformation
Managing latency compensation and sample counting
Providing thread-safe access to inference operations
Supporting both real-time and non-real-time processing modes
The manager supports multiple processing patterns:
Synchronous processing with immediate input/output
Asynchronous push/pop processing for decoupled operation
Multi-tensor processing for complex model architectures
Custom latency handling for different model types
See also
InferenceThread, PrePostProcessor, Context, HostConfig, InferenceConfig
Note
This class coordinates between multiple components and should be used as the primary interface for inference operations rather than directly accessing lower-level components.
Public Functions
-
InferenceManager() = delete¶
Default constructor is deleted to prevent uninitialized instances.
-
InferenceManager(PrePostProcessor &pp_processor, InferenceConfig &inference_config, BackendBase *custom_processor, const ContextConfig &context_config)¶
Constructor that initializes the inference manager with all required components.
Creates an inference manager with the specified preprocessing/postprocessing pipeline, inference configuration, and optional custom backend. Initializes the context and prepares for session management.
- Parameters:
pp_processor – Reference to the preprocessing/postprocessing pipeline
inference_config – Reference to the inference configuration containing model settings
custom_processor – Pointer to a custom backend processor (can be nullptr for default backends)
context_config – Configuration for the inference context and thread management
-
~InferenceManager()¶
Destructor that properly cleans up inference resources.
Ensures proper shutdown of inference threads, cleanup of sessions, and release of all managed resources.
-
void prepare(HostConfig config, std::vector<long> custom_latency = {})¶
Prepares the inference manager for processing with new audio configuration.
Initializes the inference pipeline with the specified host configuration and optional custom latency settings. This method must be called before processing begins or when audio settings change.
- Parameters:
config – Host configuration containing sample rate, buffer size, and audio settings
custom_latency – Optional vector of custom latency values for each tensor (empty for automatic calculation)
-
size_t *process(const float *const *const *input_data, size_t *num_input_samples, float *const *const *output_data, size_t *num_output_samples)¶
Processes multi-tensor audio data with separate input and output buffers.
Performs complete inference processing for multiple tensors simultaneously, handling preprocessing, inference execution, and postprocessing. This method supports complex model architectures with multiple inputs and outputs.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor
output_data – Output data buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor
- Returns:
Array of actual output sample counts for each tensor
-
void push_data(const float *const *const *input_data, size_t *num_input_samples)¶
Pushes input data to the inference pipeline for asynchronous processing.
Queues input data for processing without waiting for results. This enables decoupled input/output processing where data can be pushed and popped independently for buffered processing scenarios.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor
-
size_t *pop_data(float *const *const *output_data, size_t *num_output_samples)¶
Pops processed output data from the inference pipeline.
Retrieves processed data from the inference pipeline. Should be used in conjunction with push_data for decoupled processing patterns.
Note
This method is real-time safe and should not allocate memory
- Parameters:
output_data – Output buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor
- Returns:
Array of actual output sample counts for each tensor
-
void set_backend(InferenceBackend new_inference_backend)¶
Sets the inference backend to use for neural network processing.
Changes the active inference backend, which may trigger session reinitialization if the new backend differs from the current one.
- Parameters:
new_inference_backend – The backend type to use (ONNX, LibTorch, TensorFlow Lite, or Custom)
-
InferenceBackend get_backend() const¶
Gets the currently active inference backend.
- Returns:
The currently configured inference backend type
-
std::vector<unsigned int> get_latency() const¶
Gets the processing latency for all tensors.
Returns the latency introduced by the inference processing in samples for each tensor. This includes buffering delays, preprocessing/postprocessing latency, and model-specific processing latency.
- Returns:
Vector containing latency values in samples for each tensor index
-
size_t get_available_samples(size_t tensor_index, size_t channel) const¶
Gets the number of samples received for a specific tensor and channel (for unit testing)
This method is primarily used for unit testing and debugging purposes to monitor the data flow through the inference pipeline.
- Parameters:
tensor_index – Index of the tensor to query
channel – Channel index to query
- Returns:
Number of samples received for the specified tensor and channel
-
const Context &get_context() const¶
Gets a const reference to the inference context (for unit testing)
Provides access to the internal inference context for testing and debugging. This method should primarily be used for unit testing purposes.
- Returns:
Const reference to the internal Context object
-
int get_session_id() const¶
Gets the current session ID.
Returns the unique identifier for the current inference session. This can be useful for debugging and session tracking purposes.
- Returns:
The current session ID
-
void set_non_realtime(bool is_non_realtime) const¶
Configures the manager for non-real-time operation.
When set to true, relaxes real-time constraints and may use different processing algorithms or memory allocation strategies optimized for offline processing rather than real-time audio.
- Parameters:
is_non_realtime – True to enable non-real-time mode, false for real-time mode
-
void reset()¶
Resets the inference session to its initial state.
This method clears all internal buffers, resets the inference pipeline, and prepares the handler for a new processing session.
Note
This method waits for all ongoing inferences to complete before resetting.