Class anira::InferenceHandler¶
-
class InferenceHandler¶
Collaboration diagram for anira::InferenceHandler:
Main handler class for neural network inference operations.
The InferenceHandler provides a high-level interface for performing neural network inference in real-time audio processing contexts. It manages the inference backend, data buffering, and processing pipeline while ensuring real-time safety.
This class supports multiple processing modes:
Single tensor processing for simple models
Multi-tensor processing for complex models with multiple inputs/outputs
Push/pop data patterns for decoupled processing
Note
This class is designed for real-time audio processing and uses appropriate memory allocation strategies to avoid audio dropouts.
Public Functions
-
InferenceHandler() = delete¶
Default constructor is deleted to prevent uninitialized instances.
-
InferenceHandler(PrePostProcessor &pp_processor, InferenceConfig &inference_config, const ContextConfig &context_config = ContextConfig())¶
Constructs an InferenceHandler with pre/post processor and inference configuration.
- Parameters:
pp_processor – Reference to the pre/post processor for data transformation
inference_config – Reference to the inference configuration containing model settings
context_config – Optional context configuration for advanced settings (default: ContextConfig())
-
InferenceHandler(PrePostProcessor &pp_processor, InferenceConfig &inference_config, BackendBase &custom_processor, const ContextConfig &context_config = ContextConfig())¶
Constructs an InferenceHandler with custom backend processor.
- Parameters:
pp_processor – Reference to the pre/post processor for data transformation
inference_config – Reference to the inference configuration containing model settings
custom_processor – Reference to a custom backend processor implementation
context_config – Optional context configuration for advanced settings (default: ContextConfig())
-
~InferenceHandler()¶
Destructor that properly cleans up inference resources.
-
void set_inference_backend(InferenceBackend inference_backend)¶
Sets the inference backend to use for neural network processing.
- Parameters:
inference_backend – The backend type to use (e.g., ONNX, LibTorch, TensorFlow Lite or custom)
-
InferenceBackend get_inference_backend()¶
Gets the currently active inference backend.
- Returns:
The currently configured inference backend type
-
void prepare(HostConfig new_audio_config)¶
Prepares the inference handler for processing with new audio configuration.
This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.
- Parameters:
new_audio_config – The new audio configuration containing sample rate, buffer size, etc.
-
void prepare(HostConfig new_audio_config, unsigned int custom_latency, size_t tensor_index = 0)¶
Prepares the inference handler for processing with new audio configuration and a custom latency.
This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.
- Parameters:
new_audio_config – The new audio configuration containing sample rate, buffer size, etc.
custom_latency – Custom latency value in samples to override the calculated latency
tensor_index – Index of the tensor to apply the custom latency (default: 0)
-
void prepare(HostConfig new_audio_config, std::vector<unsigned int> custom_latency)¶
Prepares the inference handler for processing with new audio configuration and custom latencies for each tensor.
This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.
- Parameters:
new_audio_config – The new audio configuration containing sample rate, buffer size, etc.
custom_latency – Vector of custom latency values in samples for each tensor
-
size_t process(float *const *data, size_t num_samples, size_t tensor_index = 0)¶
Processes audio data in-place for models with identical input/output shapes.
This is the most simple processing method when input and output have the same data shape and only one tensor index is streamable (e.g., audio effects with non-streamable parameters).
Note
This method is real-time safe and should not allocate memory
- Parameters:
data – Audio data buffer organized as data[channel][sample]
num_samples – Number of samples to process
tensor_index – Index of the tensor to process (default: 0)
- Returns:
Number of samples actually processed
-
size_t process(const float *const *input_data, size_t num_input_samples, float *const *output_data, size_t num_output_samples, size_t tensor_index = 0)¶
Processes audio data with separate input and output buffers.
This method allows for different input and output buffer sizes and is suitable for models that have different input and output shapes.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input audio data organized as data[channel][sample]
num_input_samples – Number of input samples
output_data – Output audio data buffer organized as data[channel][sample]
num_output_samples – Maximum number of output samples the buffer can hold
tensor_index – Index of the tensor to process (default: 0)
- Returns:
Number of output samples actually written
-
size_t *process(const float *const *const *input_data, size_t *num_input_samples, float *const *const *output_data, size_t *num_output_samples)¶
Processes multiple tensors simultaneously.
This method handles complex models with multiple input and output tensors, processing all tensors in a single call.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor
output_data – Output data buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor
- Returns:
Array of actual output sample counts for each tensor
-
void push_data(const float *const *input_data, size_t num_input_samples, size_t tensor_index = 0)¶
Pushes input data to the processing pipeline for a specific tensor.
This method enables decoupled input/output processing where data can be pushed and popped independently. Useful for buffered processing scenarios.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input audio data organized as data[channel][sample]
num_input_samples – Number of input samples to push
tensor_index – Index of the tensor to receive the data (default: 0)
-
void push_data(const float *const *const *input_data, size_t *num_input_samples)¶
Pushes input data for multiple tensors simultaneously.
Note
This method is real-time safe and should not allocate memory
- Parameters:
input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor
-
size_t pop_data(float *const *output_data, size_t num_output_samples, size_t tensor_index = 0)¶
Pops processed output data from the pipeline for a specific tensor.
This method retrieves processed data from the inference pipeline. Should be used in conjunction with push_data for decoupled processing.
Note
This method is real-time safe and should not allocate memory
- Parameters:
output_data – Output buffer organized as data[channel][sample]
num_output_samples – Maximum number of samples the output buffer can hold
tensor_index – Index of the tensor to retrieve data from (default: 0)
- Returns:
Number of samples actually written to the output buffer
-
size_t *pop_data(float *const *const *output_data, size_t *num_output_samples)¶
Pops processed output data for multiple tensors simultaneously.
Note
This method is real-time safe and should not allocate memory
- Parameters:
output_data – Output buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor
- Returns:
Array of actual output sample counts for each tensor
-
unsigned int get_latency(size_t tensor_index = 0) const¶
Gets the processing latency for a specific tensor.
Returns the latency introduced by the inference processing in samples for a specific tensor. This includes buffering delays and model-specific processing latency.
- Parameters:
tensor_index – Index of the tensor to query (default: 0)
- Returns:
Latency in samples for the specified tensor
-
std::vector<unsigned int> get_latency_vector() const¶
Gets the processing latency for all tensors.
- Returns:
Vector containing latency values in samples for each tensor index
-
size_t get_available_samples(size_t tensor_index, size_t channel = 0) const¶
Gets the number of samples received for a specific tensor and channel.
This method is useful for monitoring the data flow, benchmarking and debugging purposes.
- Parameters:
tensor_index – Index of the tensor to query
channel – Channel index to query (default: 0)
- Returns:
Number of samples received for the specified tensor and channel
-
void set_non_realtime(bool is_non_realtime)¶
Configures the handler for non-real-time operation.
When set to true, relaxes real-time constraints and may use different memory allocation strategies or processing algorithms optimized for offline processing.
- Parameters:
is_non_realtime – True to enable non-real-time mode, false for real-time mode
-
void reset()¶
Resets the inference handler to its initial state.
This method clears all internal buffers, resets the inference pipeline, and prepares the handler for a new processing session. This also resets the latency and available samples for all tensors.
Note
This method waits for all ongoing inferences to complete before resetting.