Class anira::InferenceHandler¶

class InferenceHandler¶

Collaboration diagram for anira::InferenceHandler:

digraph {
graph [bgcolor="#00000000"]
node [shape=rectangle style=filled fillcolor="#FFFFFF" font=Helvetica padding=2]
edge [color="#1414CE"]
"16" [label="anira::Buffer< float >" tooltip="anira::Buffer< float >"]
"17" [label="anira::MemoryBlock< float >" tooltip="anira::MemoryBlock< float >"]
"8" [label="anira::MemoryBlock< std::atomic< float > >" tooltip="anira::MemoryBlock< std::atomic< float > >"]
"13" [label="anira::BackendBase" tooltip="anira::BackendBase"]
"10" [label="anira::Context" tooltip="anira::Context"]
"11" [label="anira::ContextConfig" tooltip="anira::ContextConfig"]
"25" [label="anira::HighPriorityThread" tooltip="anira::HighPriorityThread"]
"9" [label="anira::HostConfig" tooltip="anira::HostConfig"]
"2" [label="anira::InferenceConfig" tooltip="anira::InferenceConfig"]
"26" [label="anira::InferenceData" tooltip="anira::InferenceData"]
"1" [label="anira::InferenceHandler" tooltip="anira::InferenceHandler" fillcolor="#BFBFBF"]
"6" [label="anira::InferenceManager" tooltip="anira::InferenceManager"]
"24" [label="anira::InferenceThread" tooltip="anira::InferenceThread"]
"18" [label="anira::LibtorchProcessor" tooltip="anira::LibtorchProcessor"]
"19" [label="anira::LibtorchProcessor::Instance" tooltip="anira::LibtorchProcessor::Instance"]
"4" [label="anira::ModelData" tooltip="anira::ModelData"]
"20" [label="anira::OnnxRuntimeProcessor" tooltip="anira::OnnxRuntimeProcessor"]
"21" [label="anira::OnnxRuntimeProcessor::Instance" tooltip="anira::OnnxRuntimeProcessor::Instance"]
"7" [label="anira::PrePostProcessor" tooltip="anira::PrePostProcessor"]
"3" [label="anira::ProcessingSpec" tooltip="anira::ProcessingSpec"]
"15" [label="anira::RingBuffer" tooltip="anira::RingBuffer"]
"12" [label="anira::SessionElement" tooltip="anira::SessionElement"]
"14" [label="anira::SessionElement::ThreadSafeStruct" tooltip="anira::SessionElement::ThreadSafeStruct"]
"22" [label="anira::TFLiteProcessor" tooltip="anira::TFLiteProcessor"]
"23" [label="anira::TFLiteProcessor::Instance" tooltip="anira::TFLiteProcessor::Instance"]
"5" [label="anira::TensorShape" tooltip="anira::TensorShape"]
"16" -> "17" [dir=forward tooltip="usage"]
"13" -> "2" [dir=forward tooltip="usage"]
"10" -> "11" [dir=forward tooltip="usage"]
"10" -> "12" [dir=forward tooltip="usage"]
"10" -> "24" [dir=forward tooltip="usage"]
"10" -> "18" [dir=forward tooltip="usage"]
"10" -> "20" [dir=forward tooltip="usage"]
"10" -> "22" [dir=forward tooltip="usage"]
"10" -> "26" [dir=forward tooltip="usage"]
"2" -> "3" [dir=forward tooltip="usage"]
"2" -> "4" [dir=forward tooltip="usage"]
"2" -> "5" [dir=forward tooltip="usage"]
"1" -> "2" [dir=forward tooltip="usage"]
"1" -> "6" [dir=forward tooltip="usage"]
"6" -> "2" [dir=forward tooltip="usage"]
"6" -> "7" [dir=forward tooltip="usage"]
"6" -> "9" [dir=forward tooltip="usage"]
"6" -> "10" [dir=forward tooltip="usage"]
"6" -> "12" [dir=forward tooltip="usage"]
"24" -> "25" [dir=forward tooltip="public-inheritance"]
"24" -> "26" [dir=forward tooltip="usage"]
"18" -> "13" [dir=forward tooltip="public-inheritance"]
"18" -> "19" [dir=forward tooltip="usage"]
"19" -> "2" [dir=forward tooltip="usage"]
"19" -> "17" [dir=forward tooltip="usage"]
"20" -> "13" [dir=forward tooltip="public-inheritance"]
"20" -> "21" [dir=forward tooltip="usage"]
"21" -> "2" [dir=forward tooltip="usage"]
"21" -> "17" [dir=forward tooltip="usage"]
"7" -> "2" [dir=forward tooltip="usage"]
"7" -> "8" [dir=forward tooltip="usage"]
"15" -> "16" [dir=forward tooltip="public-inheritance"]
"12" -> "7" [dir=forward tooltip="usage"]
"12" -> "2" [dir=forward tooltip="usage"]
"12" -> "13" [dir=forward tooltip="usage"]
"12" -> "9" [dir=forward tooltip="usage"]
"12" -> "14" [dir=forward tooltip="usage"]
"12" -> "15" [dir=forward tooltip="usage"]
"12" -> "18" [dir=forward tooltip="usage"]
"12" -> "20" [dir=forward tooltip="usage"]
"12" -> "22" [dir=forward tooltip="usage"]
"22" -> "13" [dir=forward tooltip="public-inheritance"]
"22" -> "23" [dir=forward tooltip="usage"]
"23" -> "2" [dir=forward tooltip="usage"]
"23" -> "17" [dir=forward tooltip="usage"]
}

Main handler class for neural network inference operations.

The InferenceHandler provides a high-level interface for performing neural network inference in real-time audio processing contexts. It manages the inference backend, data buffering, and processing pipeline while ensuring real-time safety.

This class supports multiple processing modes:

Single tensor processing for simple models
Multi-tensor processing for complex models with multiple inputs/outputs
Push/pop data patterns for decoupled processing

Note

This class is designed for real-time audio processing and uses appropriate memory allocation strategies to avoid audio dropouts.

Public Functions

InferenceHandler() = delete¶: Default constructor is deleted to prevent uninitialized instances.

InferenceHandler(const InferenceHandler&) = delete¶: Copy constructor is deleted to prevent copying.

InferenceHandler &operator=(const InferenceHandler&) = delete¶: Copy assignment is deleted to prevent copying.

InferenceHandler(InferenceHandler&&) = delete¶: Move constructor is deleted to prevent moving.

InferenceHandler &operator=(InferenceHandler&&) = delete¶: Move assignment is deleted to prevent moving.

InferenceHandler(PrePostProcessor &pp_processor, InferenceConfig &inference_config, const ContextConfig &context_config = ContextConfig())¶

Constructs an InferenceHandler with pre/post processor and inference configuration.

Parameters:

pp_processor – Reference to the pre/post processor for data transformation
inference_config – Reference to the inference configuration containing model settings
context_config – Optional context configuration for advanced settings (default: ContextConfig())

InferenceHandler(PrePostProcessor &pp_processor, InferenceConfig &inference_config, BackendBase &custom_processor, const ContextConfig &context_config = ContextConfig())¶

Constructs an InferenceHandler with custom backend processor.

Parameters:

pp_processor – Reference to the pre/post processor for data transformation
inference_config – Reference to the inference configuration containing model settings
custom_processor – Reference to a custom backend processor implementation
context_config – Optional context configuration for advanced settings (default: ContextConfig())

~InferenceHandler()¶: Destructor that properly cleans up inference resources.

void set_inference_backend(InferenceBackend inference_backend)¶

Sets the inference backend to use for neural network processing.

Parameters:: inference_backend – The backend type to use (e.g., ONNX, LibTorch, TensorFlow Lite or custom)

InferenceBackend get_inference_backend()¶

Gets the currently active inference backend.

Returns:: The currently configured inference backend type

void prepare(HostConfig new_audio_config)¶

Prepares the inference handler for processing with new audio configuration.

This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.

Parameters:: new_audio_config – The new audio configuration containing sample rate, buffer size, etc.

void prepare(HostConfig new_audio_config, unsigned int custom_latency, size_t tensor_index = 0)¶

Prepares the inference handler for processing with new audio configuration and a custom latency.

This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.

Parameters:

new_audio_config – The new audio configuration containing sample rate, buffer size, etc.
custom_latency – Custom latency value in samples to override the calculated latency
tensor_index – Index of the tensor to apply the custom latency (default: 0)

void prepare(HostConfig new_audio_config, std::vector<unsigned int> custom_latency)¶

Prepares the inference handler for processing with new audio configuration and custom latencies for each tensor.

This method must be called before processing begins or when audio settings change. It initializes internal buffers and prepares the inference pipeline.

Parameters:

new_audio_config – The new audio configuration containing sample rate, buffer size, etc.
custom_latency – Vector of custom latency values in samples for each tensor

size_t process(float *const *data, size_t num_samples, size_t tensor_index = 0)¶

Processes audio data in-place for models with identical input/output shapes.

This is the most simple processing method when input and output have the same data shape and only one tensor index is streamable (e.g., audio effects with non-streamable parameters).

Note

This method is real-time safe and does not allocate memory. If the blocking_ratio in the inference configuration is > 0 (not default), this method introduces a controlled blocking operation to wait for processed data (semaphore.try_acquire_until()) in order to further reduce latency.

Parameters:

data – Audio data buffer organized as data[channel][sample]
num_samples – Number of samples to process
tensor_index – Index of the tensor to process (default: 0)

Returns:

Number of samples actually processed

size_t process(const float *const *input_data, size_t num_input_samples, float *const *output_data, size_t num_output_samples, size_t tensor_index = 0)¶

Processes audio data with separate input and output buffers.

This method allows for different input and output buffer sizes and is suitable for models that have different input and output shapes.

Note

Parameters:

input_data – Input audio data organized as data[channel][sample]
num_input_samples – Number of input samples
output_data – Output audio data buffer organized as data[channel][sample]
num_output_samples – Maximum number of output samples the buffer can hold
tensor_index – Index of the tensor to process (default: 0)

Returns:

Number of output samples actually written

size_t *process(const float *const *const *input_data, size_t *num_input_samples, float *const *const *output_data, size_t *num_output_samples)¶

Processes multiple tensors simultaneously.

This method handles complex models with multiple input and output tensors, processing all tensors in a single call.

Note

Parameters:

input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor
output_data – Output data buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor

Returns:

Array of actual output sample counts for each tensor

void push_data(const float *const *input_data, size_t num_input_samples, size_t tensor_index = 0)¶

Pushes input data to the processing pipeline for a specific tensor.

This method enables decoupled input/output processing where data can be pushed and popped independently. Useful for buffered processing scenarios.

Note

This method is real-time safe and does not allocate memory.

Parameters:

input_data – Input audio data organized as data[channel][sample]
num_input_samples – Number of input samples to push
tensor_index – Index of the tensor to receive the data (default: 0)

void push_data(const float *const *const *input_data, size_t *num_input_samples)¶

Pushes input data for multiple tensors simultaneously.

Note

This method is real-time safe and does not allocate memory.

Parameters:

input_data – Input data organized as data[tensor_index][channel][sample]
num_input_samples – Array of input sample counts for each tensor

size_t pop_data(float *const *output_data, size_t num_output_samples, size_t tensor_index = 0)¶

Pops processed output data from the pipeline for a specific tensor (non-blocking)

Retrieves processed data from the inference pipeline for a specific tensor. Should be used in conjunction with push_data for decoupled processing. This method is non-blocking and returns immediately with available samples.

Note

This method is real-time safe and does not allocate memory.

Parameters:

output_data – Output buffer organized as data[channel][sample]
num_output_samples – Maximum number of samples the output buffer can hold
tensor_index – Index of the tensor to retrieve data from (default: 0)

Returns:

Number of samples actually written to the output buffer

size_t pop_data(float *const *output_data, size_t num_output_samples, std::chrono::steady_clock::time_point wait_until, size_t tensor_index = 0)¶

Pops processed output data from the pipeline for a specific tensor (blocking with timeout)

Retrieves processed data from the inference pipeline for a specific tensor. This method blocks until data is available or until the specified timeout is reached. Should be used in conjunction with push_data for decoupled processing.

Note

This method is not 100% real-time safe due to potential blocking to wait for data.

Parameters:

output_data – Output buffer organized as data[channel][sample]
num_output_samples – Maximum number of samples the output buffer can hold
wait_until – Time point until which to wait for available data
tensor_index – Index of the tensor to retrieve data from (default: 0)

Returns:

Number of samples actually written to the output buffer

size_t *pop_data(float *const *const *output_data, size_t *num_output_samples)¶

Pops processed output data for multiple tensors simultaneously (non-blocking)

Retrieves processed data for all tensors from the inference pipeline. This method is non-blocking and returns immediately with available samples for each tensor.

Note

This method is real-time safe and does not allocate memory.

Parameters:

output_data – Output buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor

Returns:

Array of actual output sample counts for each tensor

size_t *pop_data(float *const *const *output_data, size_t *num_output_samples, std::chrono::steady_clock::time_point wait_until)¶

Pops processed output data for multiple tensors simultaneously (blocking with timeout)

Retrieves processed data for all tensors from the inference pipeline. This method blocks until data is available for each tensor or until the specified timeout is reached.

Note

This method is not 100% real-time safe due to potential blocking to wait for data.

Parameters:

output_data – Output buffers organized as data[tensor_index][channel][sample]
num_output_samples – Array of maximum output sample counts for each tensor
wait_until – Time point until which to wait for available data

Returns:

Array of actual output sample counts for each tensor

unsigned int get_latency(size_t tensor_index = 0) const¶

Gets the processing latency for a specific tensor.

Returns the latency introduced by the inference processing in samples for a specific tensor. This includes buffering delays and model-specific processing latency.

Parameters:: tensor_index – Index of the tensor to query (default: 0)
Returns:: Latency in samples for the specified tensor

std::vector<unsigned int> get_latency_vector() const¶

Gets the processing latency for all tensors.

Returns:: Vector containing latency values in samples for each tensor index

size_t get_available_samples(size_t tensor_index, size_t channel = 0) const¶

Gets the number of samples received for a specific tensor and channel.

This method is useful for monitoring the data flow, benchmarking and debugging purposes.

Parameters:

tensor_index – Index of the tensor to query
channel – Channel index to query (default: 0)

Returns:

Number of samples received for the specified tensor and channel

void set_non_realtime(bool is_non_realtime)¶

Configures the handler for non-real-time operation.

When set to true, relaxes real-time constraints and may use different memory allocation strategies or processing algorithms optimized for offline processing.

Parameters:: is_non_realtime – True to enable non-real-time mode, false for real-time mode

void reset()¶

Resets the inference handler to its initial state.

This method clears all internal buffers, resets the inference pipeline, and prepares the handler for a new processing session. This also resets the latency and available samples for all tensors.

Note

This method waits for all ongoing inferences to complete before resetting.