Struct anira::InferenceConfig

struct InferenceConfig

Collaboration diagram for anira::InferenceConfig:

digraph {
    graph [bgcolor="#00000000"]
    node [shape=rectangle style=filled fillcolor="#FFFFFF" font=Helvetica padding=2]
    edge [color="#1414CE"]
    "1" [label="anira::InferenceConfig" tooltip="anira::InferenceConfig" fillcolor="#BFBFBF"]
    "3" [label="anira::ModelData" tooltip="anira::ModelData"]
    "2" [label="anira::ProcessingSpec" tooltip="anira::ProcessingSpec"]
    "4" [label="anira::TensorShape" tooltip="anira::TensorShape"]
    "1" -> "2" [dir=forward tooltip="usage"]
    "1" -> "3" [dir=forward tooltip="usage"]
    "1" -> "4" [dir=forward tooltip="usage"]
}

Complete configuration for neural network inference operations.

The InferenceConfig struct serves as the central configuration hub for all aspects of neural network inference in anira. It combines model data, tensor specifications, processing parameters, and performance settings into a single, cohesive configuration.

Usage Examples:
// Simple mono audio effect
InferenceConfig config(
    {ModelData("model.onnx", ONNX)},
    {TensorShape({{1, 512}}, {{1, 512}})},
    ProcessingSpec({1}, {1}, {512}, {512}),
    10.0f  // 10ms max inference time
);

// Multi-input model with control parameters
InferenceConfig config(
    {ModelData("model.onnx", ONNX)},
    {TensorShape({{1, 512}, {1, 4}}, {{1, 512}})}, // Audio + 4 parameters
    ProcessingSpec({1, 1}, {1}, {512, 0}, {512}),  // Second input non-streamable
    15.0f
);

Public Functions

InferenceConfig() = default

Default constructor creating an empty configuration.

Creates an uninitialized InferenceConfig that must be properly configured before use. All member variables are default-initialized.

InferenceConfig(std::vector<ModelData> model_data, std::vector<TensorShape> tensor_shape, ProcessingSpec processing_spec, float max_inference_time, unsigned int warm_up = Defaults::m_warm_up, bool session_exclusive_processor = Defaults::m_session_exclusive_processor, float blocking_ratio = Defaults::m_blocking_ratio, unsigned int num_parallel_processors = Defaults::m_num_parallel_processors)

Constructs a complete InferenceConfig with processing specification.

Creates a fully configured InferenceConfig suitable for immediate use. This constructor includes explicit processing specifications for fine-grained control over the inference pipeline.

Parameters:
  • model_data – Vector of model data for different backends

  • tensor_shape – Vector of tensor shape configurations

  • processing_spec – Processing specification defining channels, sizes, and latencies

  • max_inference_time – Maximum allowed inference time in milliseconds per inference

  • warm_up – Number of warm-up inferences to perform during initialization

  • session_exclusive_processor – Whether to use exclusive processor sessions

  • blocking_ratio – Ratio controlling blocking behavior (0.0-1.0)

  • num_parallel_processors – Number of parallel inference processors to use

inline InferenceConfig(std::vector<ModelData> model_data, std::vector<TensorShape> tensor_shape, float max_inference_time, unsigned int warm_up = Defaults::m_warm_up, bool session_exclusive_processor = Defaults::m_session_exclusive_processor, float blocking_ratio = Defaults::m_blocking_ratio, unsigned int num_parallel_processors = Defaults::m_num_parallel_processors)

Constructs a simplified InferenceConfig with automatic processing specification.

Creates an InferenceConfig with automatic processing specification generation. The ProcessingSpec will be computed automatically from the provided tensor shapes, using default values for buffer sizes and channel counts.

Parameters:
  • model_data – Vector of model data for different backends

  • tensor_shape – Vector of tensor shape configurations

  • max_inference_time – Maximum allowed inference time in milliseconds per inference

  • warm_up – Number of warm-up inferences to perform during initialization

  • session_exclusive_processor – Whether to use exclusive processor sessions

  • blocking_ratio – Ratio controlling blocking behavior (0.0-1.0)

  • num_parallel_processors – Number of parallel inference processors to use

std::string get_model_path(InferenceBackend backend)

Gets the model file path for a specific backend.

Parameters:

backend – The target inference backend

Returns:

Model file path as string

const ModelData *get_model_data(InferenceBackend backend) const

Gets the model data structure for a specific backend.

Parameters:

backend – The target inference backend

Returns:

Pointer to ModelData, or nullptr if not found

std::string get_model_function(InferenceBackend backend) const

Gets the model function name for a specific backend.

Parameters:

backend – The target inference backend

Returns:

Model function name (LibTorch specific)

bool is_model_binary(InferenceBackend backend) const

Checks if the model data is binary for a specific backend.

Parameters:

backend – The target inference backend

Returns:

true if model data is binary, false if it’s a file path

TensorShapeList get_tensor_input_shape() const

Gets universal input tensor shapes.

Returns:

List of input tensor shapes (universal across backends)

TensorShapeList get_tensor_output_shape() const

Gets universal output tensor shapes.

Returns:

List of output tensor shapes (universal across backends)

TensorShapeList get_tensor_input_shape(InferenceBackend backend) const

Gets input tensor shapes for a specific backend.

Parameters:

backend – The target inference backend

Returns:

List of input tensor shapes for the specified backend

TensorShapeList get_tensor_output_shape(InferenceBackend backend) const

Gets output tensor shapes for a specific backend.

Parameters:

backend – The target inference backend

Returns:

List of output tensor shapes for the specified backend

std::vector<size_t> get_tensor_input_size() const

Gets total size (element count) for each input tensor.

Returns:

Vector of input tensor sizes

std::vector<size_t> get_tensor_output_size() const

Gets total size (element count) for each output tensor.

Returns:

Vector of output tensor sizes

std::vector<size_t> get_preprocess_input_channels() const

Gets number of input channels for each input tensor.

Returns:

Vector of input channel counts

std::vector<size_t> get_postprocess_output_channels() const

Gets number of output channels for each output tensor.

Returns:

Vector of output channel counts

std::vector<size_t> get_preprocess_input_size() const

Gets samples count required for preprocessing for each input tensor.

Returns:

Vector of input buffer sizes (0 = non-streamable)

std::vector<size_t> get_postprocess_output_size() const

Gets samples count after the postprocessing for each output tensor.

Returns:

Vector of output buffer sizes (0 = non-streamable)

std::vector<size_t> get_internal_model_latency() const

Gets internal model latency for each output tensor.

Returns:

Vector of latency values in samples

void set_tensor_input_shape(const TensorShapeList &input_shape)

Sets universal input tensor shapes.

Parameters:

input_shape – New input tensor shapes for all backends

void set_tensor_output_shape(const TensorShapeList &output_shape)

Sets universal output tensor shapes.

Parameters:

output_shape – New output tensor shapes for all backends

void set_tensor_input_shape(const TensorShapeList &input_shape, InferenceBackend backend)

Sets input tensor shapes for a specific backend.

Parameters:
  • input_shape – New input tensor shapes

  • backend – Target inference backend

void set_tensor_output_shape(const TensorShapeList &output_shape, InferenceBackend backend)

Sets output tensor shapes for a specific backend.

Parameters:
  • output_shape – New output tensor shapes

  • backend – Target inference backend

void set_preprocess_input_channels(const std::vector<size_t> &input_channels)

Sets input channel counts for preprocessing.

Parameters:

input_channels – New input channel counts

void set_preprocess_output_channels(const std::vector<size_t> &output_channels)

Sets output channel counts for postprocessing.

Parameters:

output_channels – New output channel counts

void set_preprocess_input_size(const std::vector<size_t> &preprocess_input_size)

Sets the required samples count for preprocessing each input tensor.

Parameters:

preprocess_input_size – New samples count required for preprocessing (0 = non-streamable)

void set_postprocess_output_size(const std::vector<size_t> &postprocess_output_size)

Sets the required samples count after postprocessing each output tensor.

Parameters:

postprocess_output_size – New samples count after postprocessing (0 = non-streamable)

void set_internal_model_latency(const std::vector<size_t> &internal_model_latency)

Sets internal model latency values.

Parameters:

internal_model_latency – New latency values in samples

void set_model_path(const std::string &model_path, InferenceBackend backend)

Sets model path for a specific backend.

Parameters:
  • model_path – New model file path

  • backend – Target inference backend

inline bool operator==(const InferenceConfig &other) const

Equality comparison operator.

Compares all configuration parameters using appropriate epsilon values for floating-point comparisons.

Parameters:

other – The InferenceConfig instance to compare with

Returns:

true if all parameters are equal within tolerance, false otherwise

inline bool operator!=(const InferenceConfig &other) const

Inequality comparison operator.

Parameters:

other – The InferenceConfig instance to compare with

Returns:

true if any parameters are not equal, false otherwise

const TensorShape &get_tensor_shape(InferenceBackend backend) const

Gets tensor shape configuration for a specific backend.

Retrieves the TensorShape object that matches the specified backend, or falls back to universal shapes if no backend-specific shape is found.

Parameters:

backend – The target inference backend

Throws:

std::runtime_error – if no matching shape configuration is found

Returns:

Reference to the matching TensorShape configuration

void clear_processing_spec()

Clears all processing specification parameters.

Resets all processing specification vectors to empty state. This is useful before reconfiguring the processing pipeline.

void update_processing_spec()

Updates and validates the processing specification.

Automatically computes tensor sizes from shapes, validates consistency between tensor configurations and processing parameters, and fills in default values where needed. This method should be called after any changes to tensor shapes or processing parameters.

Note

This method is automatically called by InferenceConfig constructors

Throws:
  • std::invalid_argument – if tensor shapes and processing parameters are inconsistent

  • std::invalid_argument – if tensor dimensions are invalid (non-positive)

Public Members

std::vector<ModelData> m_model_data

Model data for different inference backends.

std::vector<TensorShape> m_tensor_shape

Tensor shape configurations for inputs and outputs.

ProcessingSpec m_processing_spec

Processing specification for preprocessing and postprocessing.

float m_max_inference_time

Maximum allowed inference time in milliseconds.

unsigned int m_warm_up

Number of warm-up inferences to perform.

bool m_session_exclusive_processor

Whether to use exclusive processor sessions.

float m_blocking_ratio

Blocking ratio for real-time control (0.0-1.0)

unsigned int m_num_parallel_processors

Number of parallel inference processors.

struct Defaults

Default values for inference configuration parameters.

This nested struct provides sensible default values for optional InferenceConfig parameters, ensuring consistent behavior across different usage scenarios.

Public Static Attributes

static constexpr unsigned int m_warm_up = 0

Default number of warm-up inferences (0 = no warm-up)

static constexpr bool m_session_exclusive_processor = false

Default session exclusivity (false = shared processors)

static constexpr float m_blocking_ratio = 0.f

Default blocking ratio (0.0 = non-blocking)

static unsigned int m_num_parallel_processors = (std::thread::hardware_concurrency() / 2 > 0) ? std::thread::hardware_concurrency() / 2 : 1

Default number of parallel processors (half of available hardware threads, minimum 1)