Fork me on GitHub
Nothing Special   »   [go: up one dir, main page]

src/arraymancer/nn_primitives/backend/nnpack

Search:
Group by:
  Source Edit

@brief Status code for any NNPACK function call. @brief Activation applied applied after a convolutional or fully-connected layer. @brief Algorithm for computing convolutional layers. For backward compatibility@brief Size of images, kernels, and pooling filters in NNPACK. @brief Padding of images in NNPACK. @brief Profiling information about time spent in different phases of a function call. @brief Computes output of a 2D convolutional layer from input and kernel tensors. @details This function targets training of convolutional neural networks and performs forward propagation. It is optimized for moderate minibatch sizes (64-128) and can be inefficient on a small minibatch. For minibatch size 1, use nnp_convolution_inference for optimal performance. @param algorithm The type of algorithm to use for convolution. Possible values are:

  • nnp_convolution_algorithm_auto -- let the function choose the algorithm.
  • nnp_convolution_algorithm_ft8x8 -- tiled convolution based on 2D Fourier transform with 8x8 blocks. Supports kernels up to 8x8.
  • nnp_convolution_algorithm_ft16x16 -- tiled convolution based on 2D Fourier transform with 16x16 blocks. Supports kernels up to 16x16.
  • nnp_convolution_algorithm_wt8x8 -- tiled convolution based on 2D Winograd transform F(3x3, 6x6). Supports only 3x3 kernels.

@param batch_size The number of images on the input and output of the convolutional layer. @param input_channels The number of channels (AKA features, dimensions) in the input images. @param output_channels The number of channels (AKA features, dimensions) in the output images. @param input_size Size of input images, excluding implicit zero-padding. @param input_padding Implicit zero-padding of input images. @param kernel_size Kernel size. @paramin input A 4D tensor inputbatch_sizeinput_size.height. @paramin kernel A 4D tensor kerneloutput_channelskernel_size.height. @paramin bias A 1D array biasoutput_channels. @paramout output A 4D tensor outputbatch_sizeoutput_size.height where output_size.height = (input_padding.top + input_size.height + input_padding.bottom) - (kernel_size.height - 1) output_size.width = (input_padding.left + input_size.width + input_padding.right) - (kernel_size.width - 1) @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @paramout profile An optional pointer to profiling structure. If provided, the structure would record time spent in different phases of the computation.

@brief Computes gradient of input of a 2D convolutional layer from gradient of output and kernel tensors. @details This function targets training of convolutional neural networks and performs backward propagation. It is optimized for moderate minibatch sizes (64-128) and can be inefficient on a small minibatch. @param algorithm The type of algorithm to use for convolution. Possible values are:
  • nnp_convolution_algorithm_auto -- let the function choose the algorithm.
  • nnp_convolution_algorithm_ft8x8 -- tiled convolution based on 2D Fourier transform with 8x8 blocks. Supports kernels up to 8x8.
  • nnp_convolution_algorithm_ft16x16 -- tiled convolution based on 2D Fourier transform with 16x16 blocks. Supports kernels up to 16x16.
  • nnp_convolution_algorithm_wt8x8 -- tiled convolution based on 2D Winograd transform F(3x3, 6x6). Supports only 3x3 kernels.

@param batch_size The number of images (and their gradients) on the input and output of the convolutional layer. @param input_channels The number of channels (AKA features, dimensions) in the input images (and gradients). @param output_channels The number of channels (AKA features, dimensions) in the output images (and gradients). @param input_size Size of input images and their gradients, excluding implicit zero-padding. @param input_padding Implicit zero-padding of input images. @param kernel_size Kernel size. @paramin grad_output A 4D tensor grad_outputbatch_sizeoutput_size.height where output_size.height = (input_padding.top + input_size.height + input_padding.bottom) - (kernel_size.height - 1) output_size.width = (input_padding.left + input_size.width + input_padding.right) - (kernel_size.width - 1) @paramin kernel A 4D tensor kerneloutput_channelskernel_size.height. @paramout grad_input A 4D tensor grad_inputbatch_sizeinput_size.height. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @paramout profile An optional pointer to profiling structure. If provided, the structure would record time spent in different phases of the computation.

@brief Computes gradient of kernel of a 2D convolutional layer from gradient of output and input tensors. @details This function targets training of convolutional neural networks and performs backward propagation. It is optimized for moderate minibatch sizes (64-128) and can be inefficient on a small minibatch. @param algorithm The type of algorithm to use for convolution. Possible values are:
  • nnp_convolution_algorithm_auto -- let the function choose the algorithm.
  • nnp_convolution_algorithm_ft8x8 -- tiled convolution based on 2D Fourier transform with 8x8 blocks. Supports kernels up to 8x8.
  • nnp_convolution_algorithm_ft16x16 -- tiled convolution based on 2D Fourier transform with 16x16 blocks. Supports kernels up to 16x16.

@param batch_size The number of images (and their gradients) on the input and output of the convolutional layer. @param input_channels The number of channels (AKA features, dimensions) in the input images. @param output_channels The number of channels (AKA features, dimensions) in the output images (and gradients). @param input_size Size of input images and their gradients, excluding implicit zero-padding. @param input_padding Implicit zero-padding of input images. @param kernel_size Kernel size. @paramin input A 4D tensor inputbatch_sizeinput_size.height. @paramin grad_output A 4D tensor grad_outputbatch_sizeoutput_size.height where output_size.height = (input_padding.top + input_size.height + input_padding.bottom) - (kernel_size.height - 1) output_size.width = (input_padding.left + input_size.width + input_padding.right) - (kernel_size.width - 1) @paramout grad_kernel A 4D tensor grad_kerneloutput_channelskernel_size.height. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @paramout profile An optional pointer to profiling structure. If provided, the structure would record time spent in different phases of the computation.

@brief Computes output of a 2D convolutional layer for a single input image and a kernel tensor. @details This function targets prediction with convolutional neural networks and performs forward propagation. @param algorithm The type of algorithm to use for convolution. Possible values are:
  • nnp_convolution_algorithm_auto -- let the function choose the algorithm.
  • nnp_convolution_algorithm_ft8x8 -- tiled convolution based on 2D Fourier transform with 8x8 blocks. Supports kernels up to 8x8.
  • nnp_convolution_algorithm_ft16x16 -- tiled convolution based on 2D Fourier transform with 16x16 blocks. Supports kernels up to 16x16.
  • nnp_convolution_algorithm_wt8x8 -- tiled convolution based on 2D Winograd transform F(3x3, 6x6). Supports only 3x3 kernels.

@param transform_strategy A strategy that guides computation of kernel transforms coefficients. Possible values are:

  • nnp_convolution_transform_strategy_block_based -- do multiplication-accumulations on blocks of transformed coefficients.
  • nnp_convolution_transform_strategy_tuple_based -- do multiplication-accumulations on tuples of transformed coefficients.

@param input_channels The number of channels (AKA features, dimensions) in the input image. @param output_channels The number of channels (AKA features, dimensions) in the output image. @param input_size Size of input image, excluding implicit zero-padding. @param input_padding Implicit zero-padding of input image. @param kernel_size Kernel size. @param output_subsampling Subsample region for output, also known as convolution stride. @paramin input A 3D tensor inputinput_channelsinput_size.width. @paramin kernel A 4D tensor kerneloutput_channelskernel_size.height. @paramin bias A 1D array biasoutput_channels. @paramout output A 3D tensor outputoutput_channelsoutput_size.width where output_size.height = (input_padding.top + input_size.height + input_padding.bottom) - (kernel_size.height - 1) output_size.width = (input_padding.left + input_size.width + input_padding.right) - (kernel_size.width - 1) @paramin workspace_buffer Buffer for scratch memory used during computation. Buffer must be aligned on 64 bytes. If workspace_buffer is NULL and workspace_size is non-NULL, NNPACK would store the size of required workspace memory at the workspace_size location, and exit without computations. If workspace_buffer is NULL and workspace_size is NULL, NNPACK would allocate memory before and deallocate after this computation, potentially at significant runtime cost. @paramin,out workspace_size Pointer to the size of workspace buffer. If workspace_buffer is NULL, NNPACK will write the size of required scratch memory to the location specified by this pointer. If workspace_buffer is non-NULL, NNPACK expects workspace_size to specify the size of the buffer, in bytes. If workspace_size is NULL, workspace_buffer must be NULL as well. In this case NNPACK would allocate memory before and deallocate after this computation, potentially at significant runtime cost. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @paramout profile An optional pointer to profiling structure. If provided, the structure would record time spent in different phases of the computation.

@brief Computes output of a fully connected layer from input and kernel matrices. @details This function targets training of convolutional neural networks and performs forward propagation. It is optimized for moderate minibatch sizes (64-128) and can be inefficient on a small minibatch. For minibatch size 1, use nnp_fully_connected_inference for optimal performance. @param batch_size The number of vectors on the input and output of the fully connected layer. @param input_channels The number of channels (AKA features, dimensions) in the input matrix. @param output_channels The number of channels (AKA features, dimensions) in the output matrix. @paramin input A 2D matrix inputbatch_size. @paramin kernel A 2D matrix kerneloutput_channels. @paramout output A 2D matrix outputbatch_size. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes output of a fully connected layer for a single input vector and a kernel matrix. @details This function targets prediction with convolutional neural networks and performs forward propagation. @param input_channels The number of channels (AKA features, dimensions) in the input vector. @param output_channels The number of channels (AKA features, dimensions) in the output vector. @paramin input A 1D array inputinput_channels of FP32 elements. @paramin kernel A 2D matrix kerneloutput_channels of FP32 elements. @paramout output A 1D array outputoutput_channels of FP32 elements. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes output of a fully connected layer for a single input vector and a kernel matrix. @details This function targets prediction with convolutional neural networks and performs forward propagation. @param input_channels The number of channels (AKA features, dimensions) in the input vector. @param output_channels The number of channels (AKA features, dimensions) in the output vector. @paramin input A 1D array inputinput_channels of FP32 elements. @paramin kernel A 2D matrix kerneloutput_channels of FP16 (ARM alternative format) elements. @paramout output A 1D array outputoutput_channels of FP32 elements. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes output of a max-pooling layer for an input tensor. @details This function targets both prediction and training of convolutional neural networks and performs forward propagation. Is is optimized for both large and small minibatch sizes. @param batch_size The number of images on the input and output of the max-pooling layer. @param channels The number of channels (AKA features, dimensions) in both input and output images. @param input_size Size of input images, excluding implicit zero-padding. @param input_padding Implicit padding of input images. The padding pixels are ignored by the pooling filter, but affect the output size. @param pooling_size Size of the pooling filter. Only 2x2 filter are currently supported. @param pooling_stride Stride of the pooling filter. Only 2x2 strides are currently supported. @paramin input A 4D tensor inputbatch_sizeinput_size.height. @paramout output A 4D tensor outputbatch_sizeoutput_size.height where output_size.height = ceil( (input_padding.top + input_size.height + input_padding.bottom - pooling_size.height) / pooling_stride.height) + 1 output_size.width = ceil( (input_padding.left + input_size.width + input_padding.right - pooling_size.width) / pooling_stride.width) + 1 @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes output of a softmax layer for an input matrix. @details This function targets both prediction and training of convolutional neural networks and performs forward propagation. Is is optimized for both large and small minibatch sizes. @param batch_size The number of vectors on the input and output of the softmax layer. @param channels The number of channels (AKA features, dimensions) in both input and output vectors. @paramin input A 2D matrix inputbatch_size. @paramout output A 2D matrix outputbatch_size. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes output of a rectified linear unit (ReLU) layer for an input matrix. @details This function targets both prediction and training of convolutional neural networks and performs forward propagation. Is is optimized for both large and small minibatch sizes. @param batch_size The number of vectors on the input and output of the ReLU layer. @param channels The number of channels (AKA features, dimensions) in both input and output matrices. @paramin input A 2D matrix inputbatch_size. @paramout output A 2D matrix outputbatch_size. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization. @brief Computes gradient of input of a rectified linear unit (ReLU) layer from gradient of output and input matrices. @details This function targets training of convolutional neural networks and performs backward propagation. Is is optimized for both large and small minibatch sizes. @param batch_size The number of vectors on the input and output of the ReLU layer. @param channels The number of channels (AKA features, dimensions) in both input and output matrices. @paramin input A 2D matrix inputbatch_size. @paramout output A 2D matrix outputbatch_size. @param threadpool A thread pool for parallelization of the computation. If threadpool is NULL, the computation would run on the caller thread without parallelization.

Types

nnp_activation {.size: 4.} = enum
  nnp_activation_identity = 0, ## * ReLU activation f(x) := max(0, x)
  nnp_activation_relu = 1
  • Identity activation f(x) := x, i.e. no transformation
  Source Edit
nnp_convolution_algorithm {.size: 4.} = enum
  nnp_convolution_algorithm_auto = 0, ## * Tiled convolution based on 2D Fourier transform with 8x8 blocks. Supports kernels up to 8x8.
  nnp_convolution_algorithm_ft8x8 = 1, ## * Tiled convolution based on 2D Fourier transform with 16x16 blocks. Supports kernels up to 16x16.
  nnp_convolution_algorithm_ft16x16 = 2, ## * Tiled convolution based on 2D Winograd transform F(3x3, 6x6) with 8x8 blocks. Supports only 3x3 kernels.
  nnp_convolution_algorithm_wt8x8 = 3, ## * Direct convolution via implicit GEMM.
  nnp_convolution_algorithm_implicit_gemm = 4, ## * Direct convolution implementation.
  nnp_convolution_algorithm_direct = 5, ## *
                                         ##  Tiled convolution based on 2D Winograd transform F(3x3, 6x6) with 8x8 blocks in FP16.
                                         ##  Supports only 3x3 kernels. Implemented only for new ARM processors (with NEON-HP),
                                         ##  on non-supported processors falls back to nnp_convolution_algorithm_wt8x8.
                                         ## 
  nnp_convolution_algorithm_wt8x8_fp16 = 6
  • Let NNPACK choose the algorithm depending on layer parameters
  Source Edit
nnp_convolution_transform_strategy {.size: 4.} = enum
  nnp_convolution_transform_strategy_compute = 1,
  nnp_convolution_transform_strategy_precompute = 2,
  nnp_convolution_transform_strategy_reuse = 3
  Source Edit
nnp_padding {.bycopy.} = object
  top*: csize_t              ## * Padding above the image data
                             ## * Padding on the right of image data
  right*: csize_t            ## * Padding below the image data
  bottom*: csize_t           ## * Padding on the left of image data
  left*: csize_t
  Source Edit
nnp_profile {.bycopy.} = object
  total*: cdouble            ## * Time spent inside the function call, in seconds.
                             ## * Time spend on transformation of the input or input gradient tensor, in seconds.
  input_transform*: cdouble  ## * Time spend on transformation of the kernel or kernel gradient tensor, in seconds.
  kernel_transform*: cdouble ## * Time spend on transformation of the output or output gradient tensor, in seconds.
  output_transform*: cdouble ## * Time spend on multiplication-accumulation of transformed coefficients, in seconds.
  block_multiplication*: cdouble
  Source Edit
nnp_size {.bycopy.} = object
  width*: csize_t ## * Width (horizontal size) of an image, kernel, or pooling filter.
                  ## * Height (vertical size) of an image, kernel, or pooling filter.
  height*: csize_t
  Source Edit
nnp_status {.size: 4.} = enum
  nnp_status_success = 0,   ## * NNPACK function was called with batch_size == 0.
  nnp_status_invalid_batch_size = 2, ## * NNPACK function was called with channels == 0.
  nnp_status_invalid_channels = 3, ## * NNPACK function was called with input_channels == 0.
  nnp_status_invalid_input_channels = 4, ## * NNPACK function was called with output_channels == 0.
  nnp_status_invalid_output_channels = 5, ## * NNPACK function was called with input_size.height == 0 or input_size.width == 0
  nnp_status_invalid_input_size = 10, ## * NNPACK function was called with input_stride.height == 0 or input_stride.width == 0
  nnp_status_invalid_input_stride = 11, ## * NNPACK function was called with input_padding not less than respective kernel (or pooling) size, i.e.:
                                         ## 
                                         ##   - input_padding.left   >= kernel_size.width  (>= pooling_size.width)
                                         ##   - input_padding.right  >= kernel_size.width  (>= pooling_size.width)
                                         ##   - input_padding.top    >= kernel_size.height (>= pooling_size.height)
                                         ##   - input_padding.bottom >= kernel_size.height (>= pooling_size.height)
                                         ## 
  nnp_status_invalid_input_padding = 12, ## * NNPACK function was called with kernel_size.height == 0 or kernel_size.width == 0
  nnp_status_invalid_kernel_size = 13, ## * NNPACK function was called with pooling_size.height == 0 or pooling_size.width == 0
  nnp_status_invalid_pooling_size = 14, ## * NNPACK function was called with pooling_stride.height == 0 or pooling_stride.width == 0
  nnp_status_invalid_pooling_stride = 15, ## * NNPACK function was called with convolution algorithm not in nnp_convolution_algorithm enumeration
  nnp_status_invalid_algorithm = 16, ## * NNPACK function was called with convolution transform strategy not in nnp_convolution_transform_strategy enum
  nnp_status_invalid_transform_strategy = 17, ## * NNPACK function was called with output_subsampling.height == 0 or output_subsampling.width == 0
  nnp_status_unsupported_input_size = 20, ## * NNPACK does not support the particular input stride for the function
  nnp_status_unsupported_input_stride = 21, ## * NNPACK does not support the particular input padding for the function
  nnp_status_unsupported_input_padding = 22, ## * NNPACK does not support the particular kernel size for the function
  nnp_status_unsupported_kernel_size = 23, ## * NNPACK does not support the particular pooling size for the function
  nnp_status_unsupported_pooling_size = 24, ## * NNPACK does not support the particular pooling stride for the function
  nnp_status_unsupported_pooling_stride = 25, ## * NNPACK does not support the particular convolution algorithm for the function
  nnp_status_unsupported_algorithm = 26, ## * NNPACK does not support the particular convolution transform strategy for the algorithm
  nnp_status_unsupported_transform_strategy = 27, ## * NNPACK does not support the particular activation function for the function
  nnp_status_unsupported_activation = 28, ## * NNPACK does not support the particular activation function parameters for the function
  nnp_status_unsupported_activation_parameters = 29, ## * NNPACK function was called before the library was initialized
  nnp_status_uninitialized = 50, ## * NNPACK does not implement this function for the host CPU
  nnp_status_unsupported_hardware = 51, ## * NNPACK failed to allocate memory for temporary buffers
  nnp_status_out_of_memory = 52, ## * Scratch space buffer is too small
  nnp_status_insufficient_buffer = 53, ## * Scratch space buffer is not properly aligned
  nnp_status_misaligned_buffer = 54
  • The call succeeded, and all output arguments now contain valid data.
  Source Edit
pthreadpool_t = pointer
  Source Edit

Consts

nnp_convolution_transform_strategy_block_based = nnp_convolution_transform_strategy_compute
  Source Edit
nnp_convolution_transform_strategy_tuple_based = nnp_convolution_transform_strategy_compute
  Source Edit
nnp_status_invalid_activation = nnp_status_invalid_pooling_size
  Source Edit
nnp_status_invalid_activation_parameters = nnp_status_invalid_pooling_stride
  Source Edit
nnp_status_invalid_output_subsampling = nnp_status_invalid_kernel_size
  Source Edit

Procs

proc nnp_convolution_inference(algorithm: nnp_convolution_algorithm;
    transform_strategy: nnp_convolution_transform_strategy;
                               input_channels: csize_t;
                               output_channels: csize_t; input_size: nnp_size;
                               input_padding: nnp_padding;
                               kernel_size: nnp_size;
                               output_subsampling: nnp_size; input: ptr cfloat;
                               kernel: ptr cfloat; bias: ptr cfloat;
                               output: ptr cfloat; workspace_buffer: pointer;
                               workspace_size: ptr csize_t;
                               activation: nnp_activation;
                               activation_parameters: pointer;
                               threadpool: pthreadpool_t;
                               profile: ptr nnp_profile): nnp_status {.cdecl,
    importc: "nnp_convolution_inference", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_convolution_input_gradient(algorithm: nnp_convolution_algorithm;
                                    batch_size: csize_t;
                                    input_channels: csize_t;
                                    output_channels: csize_t;
                                    input_size: nnp_size;
                                    input_padding: nnp_padding;
                                    kernel_size: nnp_size;
                                    grad_output: ptr cfloat; kernel: ptr cfloat;
                                    grad_input: ptr cfloat;
                                    workspace_buffer: pointer = nil;
                                    workspace_size: ptr csize_t = nil;
    activation: nnp_activation = nnp_activation_identity;
                                    activation_parameters: pointer = nil;
                                    threadpool: pthreadpool_t = nil;
                                    profile: ptr nnp_profile = nil): nnp_status {.
    cdecl, importc: "nnp_convolution_input_gradient", ...raises: [], tags: [],
    forbids: [].}
  Source Edit
proc nnp_convolution_kernel_gradient(algorithm: nnp_convolution_algorithm;
                                     batch_size: csize_t;
                                     input_channels: csize_t;
                                     output_channels: csize_t;
                                     input_size: nnp_size;
                                     input_padding: nnp_padding;
                                     kernel_size: nnp_size; input: ptr cfloat;
                                     grad_output: ptr cfloat;
                                     grad_kernel: ptr cfloat;
                                     workspace_buffer: pointer = nil;
                                     workspace_size: ptr csize_t = nil;
    activation: nnp_activation = nnp_activation_identity;
                                     activation_parameters: pointer = nil;
                                     threadpool: pthreadpool_t = nil;
                                     profile: ptr nnp_profile = nil): nnp_status {.
    cdecl, importc: "nnp_convolution_kernel_gradient", ...raises: [], tags: [],
    forbids: [].}
  Source Edit
proc nnp_convolution_output(algorithm: nnp_convolution_algorithm;
                            batch_size: csize_t; input_channels: csize_t;
                            output_channels: csize_t; input_size: nnp_size;
                            input_padding: nnp_padding; kernel_size: nnp_size;
                            input: ptr cfloat; kernel: ptr cfloat;
                            bias: ptr cfloat; output: ptr cfloat;
                            workspace_buffer: pointer = nil;
                            workspace_size: ptr csize_t = nil; activation: nnp_activation = nnp_activation_identity;
                            activation_parameters: pointer = nil;
                            threadpool: pthreadpool_t = nil;
                            profile: ptr nnp_profile = nil): nnp_status {.cdecl,
    importc: "nnp_convolution_output", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_deinitialize(): nnp_status {.cdecl, importc: "nnp_deinitialize",
                                      ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_fully_connected_inference(input_channels: csize_t;
                                   output_channels: csize_t; input: ptr cfloat;
                                   kernel: ptr cfloat; output: ptr cfloat;
                                   threadpool: pthreadpool_t): nnp_status {.
    cdecl, importc: "nnp_fully_connected_inference", ...raises: [], tags: [],
    forbids: [].}
  Source Edit
proc nnp_fully_connected_inference_f16f32(input_channels: csize_t;
    output_channels: csize_t; input: ptr cfloat; kernel: pointer;
    output: ptr cfloat; threadpool: pthreadpool_t): nnp_status {.cdecl,
    importc: "nnp_fully_connected_inference_f16f32", ...raises: [], tags: [],
    forbids: [].}
  Source Edit
proc nnp_fully_connected_output(batch_size: csize_t; input_channels: csize_t;
                                output_channels: csize_t; input: ptr cfloat;
                                kernel: ptr cfloat; output: ptr cfloat;
                                threadpool: pthreadpool_t;
                                profile: ptr nnp_profile): nnp_status {.cdecl,
    importc: "nnp_fully_connected_output", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_initialize(): nnp_status {.cdecl, importc: "nnp_initialize",
                                    ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_max_pooling_output(batch_size: csize_t; channels: csize_t;
                            input_size: nnp_size; input_padding: nnp_padding;
                            pooling_size: nnp_size; pooling_stride: nnp_size;
                            input: ptr cfloat; output: ptr cfloat;
                            threadpool: pthreadpool_t): nnp_status {.cdecl,
    importc: "nnp_max_pooling_output", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_relu_input_gradient(batch_size: csize_t; channels: csize_t;
                             grad_output: ptr cfloat; input: ptr cfloat;
                             grad_input: ptr cfloat; negative_slope: cfloat;
                             threadpool: pthreadpool_t): nnp_status {.cdecl,
    importc: "nnp_relu_input_gradient", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_relu_output(batch_size: csize_t; channels: csize_t; input: ptr cfloat;
                     output: ptr cfloat; negative_slope: cfloat;
                     threadpool: pthreadpool_t): nnp_status {.cdecl,
    importc: "nnp_relu_output", ...raises: [], tags: [], forbids: [].}
  Source Edit
proc nnp_softmax_output(batch_size: csize_t; channels: csize_t;
                        input: ptr cfloat; output: ptr cfloat;
                        threadpool: pthreadpool_t): nnp_status {.cdecl,
    importc: "nnp_softmax_output", ...raises: [], tags: [], forbids: [].}
  Source Edit
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood