Nothing Special   »   [go: up one dir, main page]

WO2018140294A1 - Neural network based on fixed-point operations - Google Patents

Neural network based on fixed-point operations Download PDF

Info

Publication number
WO2018140294A1
WO2018140294A1 PCT/US2018/014303 US2018014303W WO2018140294A1 WO 2018140294 A1 WO2018140294 A1 WO 2018140294A1 US 2018014303 W US2018014303 W US 2018014303W WO 2018140294 A1 WO2018140294 A1 WO 2018140294A1
Authority
WO
WIPO (PCT)
Prior art keywords
fixed
parameters
layer
convolutional layer
point format
Prior art date
Application number
PCT/US2018/014303
Other languages
French (fr)
Inventor
Ningyi Xu
Hucheng Zhou
Wenqiang WANG
Xi Chen
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of WO2018140294A1 publication Critical patent/WO2018140294A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • Neural networks have been widely and deeply applied in computer vision, natural language processing, and speech recognition.
  • Convolutional neural network is a special neural network and includes a large amount of learnable parameters.
  • Most of the current convolutional neural networks even if deployed on one or more fast and power-hungry Graphics Processing Units (GPUs), take a great amount of time to train.
  • GPUs Graphics Processing Units
  • Various solutions have been proposed to improve the computing speed of neural networks. However, the current solutions still have a number of problems to be solved in memory consumption and/or computation complexity.
  • a solution for training a neural network In the solution, a fixed-point format is used to store parameters of the neural networks, such as weights and biases. The parameters are also known as primal parameters to be updated for each iteration. Parameters in the fixed-point format have a predefined bit-width and can be stored in a memory unit of a special-purpose processing device.
  • the special-purpose processing device when executing the solution, receives an input to a layer of a neural network, reads parameters of the layer from the memory unit, and computes an output of the layer based on the input of the layer and the read parameters. In this way, the requirements for the memory and computing resources of the special-purpose processing device can be reduced.
  • FIG. 1 illustrates a block diagram of a computing environment in which implementations of the subject matter described herein can be implemented
  • FIG. 2 illustrates a block diagram of a neural network in accordance with an implementation of the subject matter described herein;
  • FIG. 3 illustrates an internal architecture for a forward pass of a convolutional layer of the neural network in accordance with an implementation of the subject matter described herein;
  • Fig. 4 illustrates an internal architecture for a backward pass of a layer of the neural network in accordance with an implementation of the subject matter described herein;
  • FIG. 5 illustrates a flowchart of a method for training a neural network in accordance with an implementation of the subject matter described herein;
  • FIG. 6 illustrates a block diagram of a device for training a neural network in accordance with an implementation of the subject matter described herein;
  • Fig. 7 illustrates a block diagram of a forward pass of the neural network in accordance with one implementation of the subject matter described herein;
  • Fig. 8 illustrates a block diagram of a backward pass of the neural network in accordance with one implementation of the subject matter described herein.
  • the term “includes” and its variants are to be read as open terms that mean “includes, but is not limited to.”
  • the term “based on” is to be read as “based at least in part on.”
  • the term “one implementation” and “an implementation” are to be read as “at least one implementation.”
  • the term “another implementation” is to be read as “at least one other implementation.”
  • the terms “first,” “second,” and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below.
  • model quantization has been considered to be one of the most promising approaches, because it not only significantly accelerates the speed and increases the power- efficiency, but also achieves comparable accuracy.
  • Model quantization is intended to quantize the model parameters (as well as activations and gradients) to low bit-width values, while model binarization further pushes the limit of the quantization by extremely quantizing the parameters to be a binary value (a single bit, -1 and 1).
  • Fig. 1 illustrates a block diagram of a computing device 100 in which implementations of the subject matter described herein can be implemented. It would be appreciated that the computing device 100 shown in Fig. 1 is merely illustration but not limiting the function and scope of the implementations of the subject matter described herein in any way. As illustrated in Fig. 1, the computing device 100 may include a memory 102, a controller 104, and a special-purpose processing device 106.
  • the computing device 100 can be implemented as various user terminals or service terminals with computing capability.
  • the service terminals may be servers, large-scale computer devices, and other devices provided by various service providers.
  • the user terminals for example, are any type of mobile terminals, fixed terminals, or portable terminals, including mobile phones, stations, units, devices, multimedia computers, multimedia tablets, Internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, Personal Communication System (PCS) devices, personal navigation devices, Personal Digital Assistants (PDAs), audio/video players, digital camera/camcorders, positioning devices, television receivers, radio broadcast receivers, electronic book devices, game devices, or any combination thereof, including the accessories and peripherals of these devices or any combination thereof. It is also contemplated that the computing device 100 can support any type of interface to the user (such as "wearable" circuitry and the like).
  • the special-purpose processing device 106 may further include a memory unit 108 and a processing unit 110.
  • the special-purpose processing device 106 may be a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a processor or a Central Processing Unit (CPU) with a customized processing unit, or a Graphics Processing Unit (GPU). Therefore, the memory unit 108 may be referred to as a memory-on-chip and the memory 102 may be referred to as a memory-off-chip accordingly.
  • the processing unit 110 can control the overall operations of the special-purpose processing device 106 and perform various computations.
  • the memory 102 may be implemented by various storage media, including but not limited to volatile and non-volatile medium, and removable and non-removable medium.
  • the memory 102 can be a volatile memory (such as a register, cache, Random Access Memory (RAM)), a non-volatile memory (such as, a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory)), or any combinations thereof.
  • the memory 102 may be removable and non-removable medium, and may include a machine-readable medium, such as a memory, flash drive, magnetic disk or any other media that can be used to store information and/or data and can be accessed by the computing device 100.
  • the controller 104 can control the start and end of the computing process and may further provide inputs required for forward pass of the convolutional neural network. In addition, the controller 104 can also provide the weight data for the neural network.
  • the controller 104 communicates with the special-purpose processing device 106 via a standard interface such as a PCIe bus. The controller 104 assigns the computing tasks to the processing unit 110 on the special-purpose processing device 106.
  • the processing unit 110 begins the computing process after receiving the start signal from the controller 104.
  • the controller 104 provides the inputs and weights to the processing unit 110 for computation.
  • the memory unit 108 of the special -purpose processing device 106 may be used to store parameters, such as convolution kernel weights, while the memory 102 may store input and output feature maps and intermediate data generated during computation.
  • the special-purpose processing device 106 completes the computation of the forward pass of the neural network and then returns the output result obtained from the previous layer of the convolutional neural network to the controller 104.
  • the above control process is merely exemplary. Those skilled in the art may change the control process after understanding the implementations of the subject matter described herein.
  • the computing device 100 or the special -purpose processing device 106 can perform the training of the neural networks in the implementations of the subject matter described herein.
  • model parameters also known as primal parameters
  • the parameters are updated during each iteration.
  • the parameters are stored in high-resolution format. These parameters will be quantized or binarized every time before forward pass, and the associated gradient accumulation is still performed in a floating-point domain.
  • the special-purpose processing device such as the FPGA and the ASIC, still need to implement expensive floating-point multiplication-accumulation to handle parameter updates, and even much more expensive nonlinear quantization method.
  • the limits of quantization are further pushed by representing the parameters in a fixed-point format, which can decrease the bit- width of the parameters, so as to dramatically reduce the total memory.
  • 8-bit fixed-point number can reduce the total memory space to a quarter compared with the 32-bit floating-point number.
  • This makes it possible to store the parameters on the memory-on-chip of the special-purpose processing device rather than memory-off-chip.
  • 45nm CMOS process node it means 100 times energy efficiency.
  • fixed point arithmetic operations with low resolution in the special-purpose processing device are much faster and energy efficient than floating-point representation.
  • fixed-point operations generally will dramatically reduce logic usage and power consumption, combined with higher clock frequency, shorter pipelines, and increased throughput capabilities.
  • Convolutional neural network is a particular neural network, which usually includes a plurality of layers with each layer including one or more neurons. Each neuron obtains input data from the input of the neural network or the previous layer to perform respective operations, and outputs the results to the next layer or the output of the neural network model.
  • the input of the neural network may be, for example, images, e.g., RGB images of particular pixels.
  • the output of the neural network is a score or a probability of a different class.
  • the last layer (usually the fully-connected layer) of the neural network may be provided with a loss function, which can be a cross entropy loss function. During training of the neural network, it is generally required to minimize the loss function.
  • each layer is arranged in three dimensions: width, height, and depth.
  • Each layer of the convolutional neural network converts the three-dimensional input data to a three-dimensional activation data and outputs the converted three-dimensional activation data.
  • the convolutional neural network includes various layers arranged in a sequence and each layer sends the activation data from one layer to another.
  • the convolutional neural network mainly includes three types of layers: a convolutional layer, a pooling layer, and a fully-connected layer. By stacking the layers over each other, a complete convolutional neural network may be constructed.
  • Fig. 2 illustrates example architecture of a convolutional neural network (CNN) 200 in accordance with some implementations of the subject matter described herein. It should be understood that the structure and functions of the convolutional neural network 200 are described for illustration, and do not limit the scope of the subject matter described herein. The subject matter described herein can be implemented by different structures and/or functions.
  • CNN convolutional neural network
  • the CNN 200 includes an input layer 202, convolutional layers 204 and 208, pooling layers 206 and 210, and an output layer 212.
  • the convolutional layer and the pooling layer are arranged alternately.
  • the convolutional layer 204 is followed by the adjacent pooling layer 206 and the convolutional layer 208 is followed by the adjacent pooling layer 206.
  • the convolutional layer may not be followed by a pooling layer.
  • the CNN 200 only includes one of the pooling layers 206 and 210. In some implementations, there are even no pooling layers.
  • each of the input layer 202, the convolutional layers 204 and 208, the pooling layers 206 and 210, and the output layer 212 includes one or more planes, also known as feature maps or channels.
  • the planes are arranged along the depth dimension and each plane may include two space dimensions, i.e., width and height, also known as space domain.
  • the input layer 202 may be represented by the input images, such as 32*32 RGB images.
  • the dimension for the input layer 202 is 32*32*3.
  • the width and height for the image is 32 and there are three color channels.
  • Feature map of each of the convolutional layers 204 and 208 may be obtained by applying convolutional operations on the feature map of the previous layer. By convolutional operations, each neuron in the feature map of the convolutional layers is only connected to a part of neurons of the previous layer. Therefore, applying convolutional operations on the convolutional layers indicates the presence of sparse connection between these two layers. After applying convolutional operations on the convolutional layers, the result may be applied with an activation function to determine the output of the convolutional layers.
  • each neuron is connected to a local area in the input layer 202 and computes the inner product of the local area and its weights.
  • the convolutional layer 204 may compute the output of all neurons and, if 12 filters (also known as convolution kernel) are used, the obtained output data will have a dimension of [32*32* 12].
  • activation operations may be performed on each output data in the convolutional layer 204 and the common activation functions include Sigmoid, tanh, ReLU, and so on.
  • the pooling layers 206 and 210 down sample the output of the previous layer in space dimension (width and height), so as to reduce data size in space dimension.
  • the output layer 212 is usually a fully-connected layer, in which each neuron is connected to all neurons of the previous layer.
  • the output layer 212 computes classification scores and converts data size to one-dimensional vectors, each element of the one-dimensional vectors corresponding to a respective category. For instance, regarding the convolutional network of the images in CIFAR-10 for classification, the last output layer has a dimension of 1 * 1 * 10 because the convolutional neural network will finally compress the images into one vector consisting of classification scores, where the vector is arranged along the direction of depth.
  • the convolutional neural network converts the images one by one from original pixel values to final classification score values.
  • both the activation function and the learning parameters can be used, where parameters in the convolutional layers and the fully-connected layer can be optimized based on different optimization solutions.
  • the optimization solutions include but not limited to stochastic gradient descent algorithm, adaptive momentum estimation (ADAM) method and the like. Therefore, the errors between the classification scores obtained by the convolutional neural network and labels of each image can be lowered as much as possible for the data in the training data set.
  • Training of the neural network can be further implemented by a backward pass.
  • the training set is input into the input layer of the neural network, e.g., input the training set into the input layer of the neural network in batches and iteratively update the parameters of the neural network by batch. Samples of each batch can be regarded as a mini-batch. After multiple iterations, all samples in the training set have been trained once, which is called as an epoch.
  • a plurality of inputs is grouped into a mini-batch, which is provided to the input layer.
  • the inputs are propagated layer by layer to the output layer of the neural network, so as to determine the output of the neural network, such as classification scores.
  • the classification scores are then compared with the labels in the training set to compute prediction errors by the loss function, for example.
  • the output layer discovers that the output is inconsistent with the right label.
  • the parameters of the last layer in the neural network may be adjusted and then parameters of the last second layer connected to the last layer may be adjusted. Accordingly, the layers are adjusted layer by layer in a backward direction.
  • the adjustment for all parameters in the neural network is completed, the same process is performed on the next mini-batch. In this way, the process is performed iteratively until the predefined termination condition is satisfied.
  • BNN binary neural network
  • weights and activations can be binarized to significantly speed up the performance by using the bit convolution kernels.
  • the floating-point value is converted into a single bit by the stochastic method. Although the stochastic binarization can get better performance, the computation complexity of the solution is higher since it requires the hardware resources to generate random bits when quantizing.
  • a deterministic method is adopted to convert the floating-point value into a single bit and the deterministic solution is lower in computation complexity. For example, a simple sign function sign ( ⁇ ) is used to binarize the floating-point value, as shown in equation (1).
  • ⁇ i represents an indicator function, where when the input ⁇ satisfies the condition that
  • STE can also be regarded as applying hard-tanh activation function HT to n, where HT is defined as:
  • weights and gradients can be stored in a fixed-point format, e.g., weights can be stored in the memory unit 108 of the special-purpose processing device 106 in a fixed-point format.
  • the fixed- point format includes a /-bit signed integer mantissa and a global scaling factor (e.g., 2 "n ) shared with the fixed-point values, as shown in equation (5):
  • n and mantissa mi-rrik are integers.
  • the vector v includes K elements vi ⁇ vk, which share one scaling factor 2 "n .
  • the integer n actually indicates the radix point position of the /-bit fixed- point number.
  • the scaling factor in fact refers to the position of the radix point. Because the scaling factor is usually fixed, i.e., fixed radix point, this data format is called as fixed-point number. Lowering the scaling factor will reduce the range of the fixed-point format but increases the accuracy of the fixed-point format.
  • the scaling factor is usually set to be a power of 2, since the multiplication can be replaced by bit shift to reduce the complexity of computation.
  • the following equation (6) may be used to convert data x (such as, a floating-point number) into an /-bit fixed-point number with the scaling factor
  • equation (6) also defines the rounding behavior, i.e., indicated by the rounding down operation [ ⁇ ].
  • equation (6) also defines the saturating ⁇ ⁇
  • the value of the converted fixed-point number is MAX; when is less than MIN, the value of the converted fixed-point number is MIN.
  • the scaling factor may be updated based on the data range. Specifically, it may be determined, based on overflow of the data (e.g., overflow rate and/or overflow amount), whether to update the scaling factor and how to update the scaling factor.
  • overflow rate e.g., overflow rate and/or overflow amount
  • the method for updating the scaling factor will now be explained with reference to weights. However, it would be appreciated that the method can also be applied for other parameters.
  • the scaling factor may be multiplied with the cardinal number (e.g., 2). For example, the radix point may be shifted right by one bit. If the overflow rate does not exceed the predefined threshold and, after the weight is multiplied with 2, the overflow rate is still below the predefined threshold, the range of the fixed-point number is too large. Therefore, the scaling factor may be reduced, for example, by dividing the scaling factor by the cardinal number (e.g., 2). For example, the radix point may be shifted left by one bit.
  • Fig. 3 illustrates an internal architecture for a forward pass of a convolutional layer 300 of the convolutional neural network in accordance with an implementation of the subject matter described herein.
  • the convolutional layer 300 may be a k-th layer of the neural network.
  • the convolutional layer 300 may be a convolutional layer 204 or 208 in the convolutional neural network as shown in Fig. 2.
  • legend 10 represents binary numbers and legend 20 represents fixed-point numbers. It would be appreciated that, although Fig. 3 illustrates a plurality of modules or sub-layers, in specific implementations one or more sub-layers may be omitted or modified for different purposes.
  • parameters of the convolutional layer 300 includes weights 302 and biases 304, respectively denoted as & and b k ' , i.e., weights and biases of the k- th layer.
  • parameters of the convolutional layer 300 may be represented and stored in fixed-point format instead of floating-point format.
  • the parameters in fixed-point format may be stored in the memory unit 108 of the special - purpose processing device 106 and may be read from the memory unit 108 during operation.
  • the weights 302 in fixed-point format are converted by a binary sub-layer 308 to binary weights 310, which may be represented by .
  • the binary sub-layer 308 may convert the fixed-point weights 302 into binary weights 310 by a sign function, as shown in equation (1).
  • the convolutional layer 300 further receives an input 306, which may be represented by x .
  • the input 306 can be input images of the neural network. In this case, the input 306 can be regarded as an 8- bit integer vector (0-225).
  • the convolutional layer 300 when the convolutional layer 300 is a hidden layer or an output layer of the neural network, for example, the input 306 may be an output of the previous layer, which may be a binary vector (+1 or -1). In both cases, convolutional operation only includes integer multiplication and accumulation and may be computed by bit convolution kernels. In some implementations, if the convolutional layer 300 is the first layer, it may be processed according to equation (8),
  • x represents an input 306 in an 8-bit fixed-point format
  • w b represents a binary weight
  • x 11 represents the mantissa of the n-th element of vector x.
  • a normalization sub-layer 316 represents integer batch normalization (IBN) sublayer, which normalizes input tensor within a mini-batch with mean and variance. Different from conventional batch normalization performed in floating-point domain, all intermediate results involved in the sub-layer 316 are either 32-bit integers or low resolution fixed-point values. Since integer is a special fixed-point number, the IBN sub-layer 316 only includes corresponding fixed-point operations. Subsequently, the quantization sublayer 318 converts the output of the IBN sub-layer 316 to a predefined fixed-point format.
  • IBN integer batch normalization
  • output - i S i i 1 , - - - f ⁇ me sum su-n - - ⁇ ⁇ ' ⁇ .
  • anc j sum of squares s,tm2 ⁇ ' of all inputs can be determined.
  • anc j the variance var in p U t are computed based on suml and sum 2, wherein Round ( ) means rounding to the nearest 32-bit integer.
  • the normalized output can be converted to ⁇ " * - irX ⁇ ) ⁇ ) in a predefined fixed- point format via the sub-layer 318.
  • the method for updating scaling factors described in the Quantization section above can be used to update the scaling factors. For example, it may be first determined whether the overflow rate of the IBN output exceeds the predefined threshold. If the overflow rate is greater than the predefined threshold, the range of the IBN output is extended, that is, increasing the scaling factor or right shifting the radix point in fixed-point format when the cardinal number is 2. This will not be repeated because it is substantially the same as the method for updating scaling factors described with reference to quantization.
  • a summing sub-layer 320 adds the output of the IBN sub-layer 136 with the bias 304 to provide an output Sk.
  • the bias 304 may be read from the memory unit 108 of the special-purpose processing device 106.
  • the activation sublayer 322 represents an activation function, which is usually implemented by a non-linear activation function, e.g., hard-tanh function HT.
  • the output of the activation sub-layer 322 is converted via the quantization sub-layer 324 to an output 326 in a fixed-point format, which is denoted by X +1 , to be provided to the next layer (k+1 layer) of the neural network.
  • the last layer of the neural network may not include the activation sub-layer 322 and binary layer 324, i.e., the loss function layer is computed in a floating-point domain.
  • a pooling layer is located after the convolutional layer 300.
  • both of the convolutional layers 204 and 208 are followed by a pooling layer 206 in the convolutional neural network 200.
  • the pooling layer may be incorporated into the convolutional layer 300 to further reduce computation complexity.
  • the pooling layer 206 is incorporated into the convolutional layer 204 in the convolutional neural network 200.
  • the pooling sub-layer 314 indicated by the dotted line may be incorporated into the convolutional layer 300 and arranged between the convolutional sub-layer 321 and the IBN sub-layer 316.
  • the forward pass of the entire neural network may be stacked by a plurality of similar processes.
  • the output of the k-th layer is provided to the k+1 layer to serve as an input of the k+1 layer; and the process continues layer by layer.
  • the output of the convolutional layer 204 is determined from the architecture of the convolutional layer 300 (without the sub-layer 314). If the pooling layer 206 is incorporated into the convolutional layer 204, the output of the pooling layer 206 may be determined by the architecture of the convolutional layer 300 (including the sub-layer 314). Then, the output is provided to the convolutional layer 208 and the classification category is provided at the output layer 212.
  • Fig. 4 illustrates an internal architecture for a backward pass of a convolutional layer 400 of the convolutional neural network in accordance with an implementation of the subject matter described herein.
  • the backward pass process is shown in Fig. 4 from right to left.
  • legend 30 represents floating-point number and legend 20 represents fixed-point number.
  • the forward pass and backward pass process of the convolutional layer is respectively indicated by the signs 300 and 400
  • the convolutional layers 300 and 400 may refer to the same layer in the neural network.
  • the convolutional layers 300 and 400 may be the architecture for implementing the forward pass and backward pass of the convolutional layer 204 or 208 in the convolutional neural network 200.
  • Fig. 4 illustrates a plurality of modules or sub-layers, each sub-layer can be omitted or modified in specific implementations for different purposes in view of different situations.
  • the convolutional layer 400 receives a backward input 426 from a next layer of the neural network, e.g., if the convolutional layer 400 is a k-th layer, the convolutional layer 400 receives a backward input 426 from the k+1- th layer.
  • the backward input 426 may be a gradient of the loss function with respect to the forward output 326 of the convolutional layer 300.
  • the gradient may be in floating-point format and may be represented as g h .
  • the backward input 426 is converted to a fixed-point value 430 (denoted by g fx )
  • the activation sub-layer 422 computes its output based on the fixed-point value 430, i.e., the gradient of the loss function with respect to the input sk of the activation sub-layer 322, denoted by
  • the activation sub-layer 322 in Fig. 3 corresponds to the activation sub-layer 422 in Fig. 4, which serves as a backward gradient operation for the activation sub-layer 322.
  • the input of the activation sub-layer 322 is x
  • the output thereof is y
  • the backward input of the corresponding activation sub-layer 422 is a gradient of the loss function with respect to the output y
  • the backward output is a gradient of the loss function with respect to the input x.
  • the backward output of the activation sub-layer 422 is provided to the summing sub-layer 420, which corresponds to the summing sub-layer 320, and the gradients of the loss function with respect to two inputs of the summing sub-layer 320 may be determined. Because an input of the sub-layer 320 is the bias, the gradient of the loss function with respect to the bias may be determined and the gradient is provided to the quantization sublayer 428. Subsequently, the gradient is converted to a fixed-point gradient by the quantization sub-layer 428 for updating the bias 404 (represented by fe ).
  • the fixed- point format has a specific scaling factor, which may be updated in accordance with the method for updating scaling factors as described in the Quantization section above.
  • Another backward output of the summing sub-layer 420 is propagated to the IBN sub-layer 418.
  • forward pass a fixed-point format is used to compute the IBN sub-layer 418.
  • the IBN sub-layer 418 is returned to the floating-point domain for operations, so as to provide an intermediate gradient output.
  • the intermediate gradient output is a gradient of the loss function with respect to the convolution of the input and parameters.
  • an additional quantization sub-layer 416 is utilized after the IBN sub-layer 418 for converting the floating-point format into a fixed-point format.
  • the quantization sub-layer 416 converts the intermediate gradient output to a fixed-point format having a specific scaling factor, which may be updated according to the method for updating scaling factors as described in the Quantization section above.
  • the convolutional sub-layer 412 further propagates a gradient g w b of the loss function with respect to the weight WP and a gradient g b of the loss function with
  • the convolutional sub-layer 412 only contains fixed- point multiplication and accumulation, thereby resulting in a very low computation complexity.
  • the backward output g v b of the convolutional sub-layer 412 provides a backward
  • the backward output g w b of the convolutional sub-layer 412 is converted to a fixed-point format via the quantization layer 408 to update the weight 402 (represented by w xp ).
  • the fixed-point format has a specific scaling factor, which may be updated according to the method for updating scaling factors as described in the Quantization section above.
  • the parameters may be updated.
  • the parameter may be updated by various updating rules, e.g., stochastic gradient descent, Adaptive Momentum Estimation (ADAM), or the like.
  • the updating rules are performed in the fixed-point domain to further reduce floating-point computation. It would be appreciated that, although reference is made to the ADAM optimization method, any other suitable methods currently known or to be developed in the further may also be implemented.
  • ADAM method dynamically adjusts the learning rate for each parameter based on a first moment estimate and a second moment estimate of the gradient of the loss function with respect to each parameter.
  • Fixed-point ADAM optimization method differs from the standard ADAM optimization method in that the fixed-point ADAM method operates entirely within the fixed-point domain. In other words, its intermediate variables (e.g., first moment estimate and second moment estimate) are represented by fixed-point numbers.
  • one fixed-point ADAM learning rule is represented by the following equation
  • gf denotes element-by-element square # ⁇ 0 #*.
  • ' 3 ⁇ 4 are respectively fixed to be and ' FXP( ) represents a function of equation (6).
  • the parameter represents the current fixed-point parameter value with a fixed-point format h, m, and 3 ⁇ 4 represents the updated fixed-point parameter value.
  • the fixed-point format for the gradient gt is h and «2, and 3 ⁇ 4 is the learning rate. It can be seen that the ADAM method computes the updated parameters by calculating the intermediate variables mt, vt, and tit, and only includes respective fixed-point operations.
  • the updated weight V fc fxp and bias b xp can be computed.
  • these parameters can be stored in a memory unit 108 of the special-purpose processing device 106 in a fixed-point format.
  • the scaling factors for the fixed-point format of the parameters may also be updated as described above.
  • the scaling factors may be updated according to the method for updating scaling factors as described in the Quantization section above.
  • pooling layer is incorporated into the convolutional layer 300 to serve as its pooling sub-layer 314 in the forward pass
  • a corresponding pooling layer should be incorporated into the convolutional layer 400 to serve as its pooling sub-layer 414 in the backward pass.
  • the quantization sub-layer may be implemented by a linear quantization method, and an adaptive updating method for scaling factors of the fixed-point parameters corresponding to the quantization sub-layer may be used to ensure that no significant drop will occur in accuracy.
  • the linear quantization method can greatly lower computation complexity, which can further facilitate the deployment of the convolutional neural network on the special-purpose processing device.
  • the backward pass process has been introduced above with reference to a convolutional layer 400. It would be appreciated that the backward pass of the entire neural network can be stacked by a plurality of similar processes. For example, the backward output of the k+l-th layer is provided to the k-th layer to serve as a backward input of the k-th layer; and the parameter of each layer is updated layer by layer.
  • the backward output of the convolutional layer 204 can be determined by the architecture of the convolutional layer 300 (including a sublayer 314).
  • the backward output is provided to the input layer 202, to finally finish updating all parameters of the neural network 200, thereby completing an iteration of a mini- batch. Iteratively completing iterations of all mini-batches in the training set may be referred to as finishing a full iteration of the data set, which is also known as epoch. After a plurality of epochs, if the training result satisfies the predefined threshold condition, the training is complete.
  • the threshold condition can be a predefined number of epochs or a predefined accuracy.
  • the adaptive updating method may be performed once after a plurality of iterations.
  • the frequency for applying the adaptive updating method may vary for different quantities.
  • the adaptive updating method may be applied more frequently for the gradients, because the gradients tend to fluctuation more extensively.
  • Fig. 5 illustrates a flowchart of a method 500 for a convolutional neural network in accordance with implementations of the subject matter described herein.
  • the method 500 may be performed on the special-purpose processing device 106 as shown in Fig. 1.
  • the special-purpose processing device 106 may be an FPGA or ASIC, for example.
  • an input to a convolutional layer of the neural network is received.
  • the input may be received from the previous layer, or may be an input image for the neural network.
  • the input may correspond to samples of a mini-batch in the training set.
  • parameters of the convolutional layer are read from a memory unit 108 of the special-purpose processing device 106, where the parameters are stored in the memory unit 108 of the special -purpose processing device 106 in a first fixed-point format and have a predefined bit-width.
  • the parameters may represent either weight parameters or bias parameters of the convolutional layer, or may represent both the weight parameters and the bias parameters.
  • the bit-width of the first fixed-point format is smaller than the floating-point number to reduce the memory space of the memory unit 108.
  • the output of the convolutional layer is computed by fixed-point operations based on the input of the convolutional layer and the read parameters.
  • the convolutional operations may be performed on the input and the parameters of the convolutional layer to obtain an intermediate output, which is normalized to obtain a normalized output.
  • the normalization only includes respective fixed-point operations.
  • the normalization may be implemented by the IBN layer 316 as shown in Fig. 3.
  • the scaling factors of the parameters above are adaptively updated. For example, a backward input to the convolutional layer is received at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. Based on the backward input, the gradient of the loss function of the neural network with respect to parameters of the convolutional layer is computed.
  • the parameters in the first fixed-point format may be updated based on the gradient of the loss function of the neural network with respect to parameters.
  • the scaling factors of the first fixed-point format may be updated based on the updated parameter range. For example, the fixed-point format of the parameters may be updated by the method described above with reference to quantization.
  • the updated parameters may be stored in the memory unit 108 of the special- purpose processing device 106 to be read at the next iteration.
  • the fixed-point format of the parameters may be updated at a certain frequency.
  • updating parameters only include respective fixed-point operations, which may be implemented by a fixed-point ADAM optimization method, for example.
  • the gradient of the loss function with respect to the parameters may be first converted to a second fixed-point format for updating parameters in the first fixed-point form.
  • the first fixed-point format may be identical to or different from the second fixed-point format.
  • the conversion method can be carried out by a linear quantization method.
  • the gradient of the loss function of the neural network with respect to parameters may be converted to the second fixed-point format by a linear quantization method.
  • the parameters in the first fixed-point format may be updated based on the gradient in the second fixed-point format.
  • the scaling factors of the second fixed-point format may be updated based on the range of the gradient of the loss function with respect to the parameters.
  • the linear quantization method has a lower computation complexity and the performance will not be substantially degraded, because the scaling factor updating method is employed in the implementations of the subject matter described herein.
  • computing the output of the convolutional layer further comprises: converting the normalized output to a normalized output in a third fixed-point format, where the scaling factors of the third fixed-point format may be updated based on the range of the normalized output in the third fixed-point format.
  • the output of the IBN sub-layer 316 may be provided to the quantization layer 318, which can convert the normalized output of the IBN sub-layer 316 to a normalized output in a second fixed-point format.
  • the scaling factors of the second fixed-point format can be updated depending on various factors.
  • the updating method may be configured to be carried out after a given number of iterations, which updating method may be the method described in the Quantization section above.
  • the method further comprises: receiving a backward input to the convolutional layer at the output of the convolutional layer, which backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. Then, the intermediate backward output is obtained based on the normalized backward gradient operations.
  • the gradient of the loss function with respect to the convolution above is computed based on the backward input.
  • the backward gradient operations of the IBN gradient sub-layer 416 corresponds to normalization of the IBN sub-layer 416 as shown in Fig. 4.
  • the backward gradient operations can be performed on the IBN gradient sub-layer 416 to get an intermediate backward output.
  • the intermediate backward output is converted to a fourth fixed-point format and the scaling factors of the fourth fixed-point format can be updated based on the range of the intermediate backward output.
  • the scaling factors of the fourth fixed-point format may be updated according to the updating method described above with reference to quantization.
  • the training process of the entire neural network may be stacked by the process of method 500 as described above with reference to Figs. 3 and 4.
  • Fig. 1 illustrates an example implementation of the special-purpose processing device 106.
  • the special-purpose processing device 106 includes a memory unit 108 for storing parameters of the neural network, and a processing unit 110 for reading the stored parameters from the memory unit 108 and using the parameters to process the input.
  • Fig. 6 illustrates a block diagram of a further example implementation of the special-purpose processing device 106.
  • the special-purpose processing device 106 may be an FPGA or ASIC, for example.
  • the special-purpose processing device 106 includes a memory module 602 configured to store parameters of the convolutional layer of the neural network in a first fixed-point format, where the parameters in the first fixed-point format have a predefined bit-width.
  • the memory module 602 is similar to the memory unit 108 of Fig. 1 in terms of functionality and both of them may be implemented using the same or different techniques or processes.
  • the bit-width of the first fixed-point format is smaller than the floating-point numbers to reduce memory space of the memory module 602.
  • the special-purpose processing device 106 further includes an interface module 604 configured to receive an input to the convolutional layer. In some implementations, the interface module 604 may be used for processing various inputs and outputs between various layers of the neural network.
  • the special-purpose processing device 106 further includes a data access module 606 configured to read parameters of the convolutional layer from the memory module 602. In some implementations, the data access module 606 may interact with the memory module 602 to process the access to the parameters of the neural network.
  • the special-purpose processing device 106 may further include a computing module 608 configured to compute, based on the input of the convolutional layer and the read parameters, the output of the convolutional layer by a fixed-point operation.
  • the interface module 604 is further configured to receive a backward input to the convolutional layer at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer.
  • the computing module 608 is further configured to compute a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer based on the backward input; and update parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters, where the scaling factors of the first fixed- point format can be updated based on the range of the updated parameters.
  • updating parameters only includes respective fixed-point operations.
  • the computing module 608 is further configured to convert the gradient of the loss function of the neural network with respect to the parameters to a second fixed-point format by a linear quantization method, where the scaling factors of the second fixed-point format can be updated based on the gradient of the loss function with respect to the parameters; and update the parameters based on the gradient in the second fixed-point format.
  • the computing module 608 is further configured to normalize a convolution of the input of the convolutional layer and the parameters to obtain a normalized output, where the normalization only includes respective fixed-point operations.
  • the computing module 608 is further configured to convert the normalized output to a normalized output in a third fixed-point format, where the scaling factors of the third fixed-point format can be updated based on the range of the normalized output in the third fixed-point format.
  • the interface module 604 is further configured to obtain a backward input to the convolutional layer at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer.
  • the computing module 608 may be configured to compute the gradient of the loss function with respect to the convolution based on the backward input; and convert the gradient of the loss function with respect to the convolution to a fourth fixed-point format, where the scaling factor of the fourth fixed-point format can be updated based on the range of the gradient of the loss function with respect to the convolution.
  • the following section will introduce the important factors that affect the final prediction accuracy of the training model of the neural network in accordance with implementations of the subject matter described herein.
  • the factors comprise the batch normalization (BN) scheme, bit-width of the primal parameters, and bit-width of gradients.
  • BN batch normalization
  • BNN binary neural network
  • a data set CIRFA-30 is used, where the data set CIRFA-30 is an image classification benchmark with 60K 32x32 RGB tiny images. It consists of 10 classes object, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Each class has 5K training images and IK test images.
  • three networks with different size are designed by stacking basic structural modules of the neural network shown in Figs. 3 and 4, including Model-S (Small), Model-M (Medium) and Model-L (Large). The overall network structure is illustrated in Figs. 7 and 8.
  • Fig. 7 illustrates a block diagram of a forward pass of the convolutional neural network 700 in accordance with an implementation of the subject matter described herein and Fig. 8 illustrates a block diagram of a backward pass of the convolutional neural network 800 in accordance with an implementation of the subject matter described herein.
  • each epoch means one training that uses all samples in the training set and each iteration uses samples of one batch for training, each epoch has 250 iterations accordingly. Furthermore, in the experiments, by using the fixed-point ADAM optimization method or standard ADAM optimization method and setting the initial learning rate as 2 "6 , the learning rate will be decreased by a factor of 2 "4 every 50 epochs.
  • the accuracy loss of the neural network is quite stable with respect to the bit-width of the IBN output, as low as 6 bits. If the bit-width of the IBN output continues to decrease, the accuracy will suffer a cliff-off drop.
  • the effect of the gradient bit-width is also evaluated.
  • the gradients are more unstable than the parameters, which shows that the scaling factors of the gradients should be updated more frequently.
  • the update occurs every 375 iterations (1% of total iterations) and the fixed-point ADAM method is employed.
  • the primal parameters are set with floating-point values. It can be seen from the testing that the prediction accuracy decreases very slowly when the bit-width of the gradient is reduced. The prediction accuracy also suffers a cliff-off drop when the bit-width of the gradient is lower than 12 bits, which is similar to the effect of the IBN output and the parameter bit-width.
  • the test is performed by combining the three effects, i.e., the neural network is implemented to substantially involve fixed-point computations only. In this way, the result in Table 2 can be obtained.
  • the relative storage is characterized by a product of the parameter number and the bits of the primal weight. It can be seen from Table 2 that a comparable accuracy with a larger bit-width can be obtained when the bit-width of the primal weight is 12 and the bit-width of the gradient is also 12. As the weight bit-width decreases, the storage will be substantially decreased. Therefore, the training solution for the neural network according to implementations of the subject matter described herein can lower the storage while maintaining computation accuracy.
  • the method can achieve comparable result with the state-of- art works (not shown) when the bit-width of each of the primal weight and the gradient is 12.
  • the method dramatically reduces the storage and significantly improves system performance.
  • a special-purpose processing device comprises a memory unit configured to store parameters of a layer of a neural network in a first fixed-point format, the parameters in the first fixed-point format having a predefined bit-width; a processing unit coupled to the memory unit and configured to perform acts including: receiving an input to the layer; reading the parameters of the layer from the memory unit; and computing, based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
  • the layer of the neural network includes a convolutional layer.
  • the acts further include: receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
  • updating the parameters only include a respective fixed- point operation.
  • updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises: converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed- point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
  • computing the output of the layer comprises: normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
  • computing the output of the convolutional layer further comprises: converting the normalized output into the normalized output in a third fixed- point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
  • the acts further include: obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function with respect to the convolution; and converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
  • the special -purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having a customized processing unit, or a graphics processing unit (GPU).
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • GPU graphics processing unit
  • a method executed by a special-purpose processing device including a memory unit and a processing unit.
  • the method comprises receiving an input to a layer of a neural network; reading parameters of the layer from the memory unit of the special-purpose processing device, the parameters being stored in the memory unit in a first fixed-point format and having a predefined bit-width; and computing, by the processing unit and based on the input of the layer and the read parameters, an output of the layer through a fixed- point operation.
  • the layer of the neural network includes a convolutional layer.
  • the method further comprises: receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
  • updating the parameters only include a respective fixed- point operation.
  • updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises: converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed- point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
  • computing the output of the layer comprises: normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
  • computing the output of the convolutional layer further comprises: converting the normalized output into the normalized output in a third fixed- point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
  • the method further comprises: obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function with respect to the convolution; and converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
  • the special-purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having customized processing units or a graphics processing unit (GPU).
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • GPU graphics processing unit
  • a special-purpose processing device comprises: a memory module configured to store parameters of a layer of a neural network in a first fixed-point format, the parameters in the first fixed-point format having a predefined bit-width; an interface module configured to receive an input to the layer; a data access module configured to read the parameters of the layer from the memory module; and a computing module configured to compute, based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
  • the layer of the neural network includes a convolutional layer.
  • the interface module is further configured to receive a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer;
  • the computing module is further configured to: compute, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, and update the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
  • updating the parameters only include a respective fixed- point operation.
  • the computing module is further configured to: convert the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed-point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and update the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
  • the computing module is further configured to: normalize a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
  • the computing module is further configured to: convert the normalized output into the normalized output in a third fixed-point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
  • the interface module is further configured to: obtain a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; compute, based on the backward input, a gradient of the loss function with respect to the convolution; and convert the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
  • the special-purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having customized processing units or a graphics processing unit (GPU).
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • GPU graphics processing unit
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application-specific Integrated Circuits
  • ASSP Application-specific Standard Product
  • SOC System-on-a-chip systems
  • CPLD Complex Programmable Logic Devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

Implementations of the subject matter described herein propose a solution for training a convolutional neural network. In this solution, parameters of the neural network are stored in a fixed-point format, such as weights and biases. The parameters in the first fixed-point format have a predefined bit-width and may be stored in a memory unit of the special-purpose processing device. The special-purpose processing device, when executing the solution, receives an input to a convolutional layer, reads parameters of the convolutional layer from the memory unit, and computes an output of the convolutional layer based on the input of the convolutional layer and the read parameters. In this way, the demands on the storage space and computing resources of the special-purpose processing device are lowered.

Description

NEURAL NETWORK BASED ON FIXED-POINT OPERATIONS
BACKGROUND
[0001] Neural networks have been widely and deeply applied in computer vision, natural language processing, and speech recognition. Convolutional neural network is a special neural network and includes a large amount of learnable parameters. Most of the current convolutional neural networks, even if deployed on one or more fast and power-hungry Graphics Processing Units (GPUs), take a great amount of time to train. Various solutions have been proposed to improve the computing speed of neural networks. However, the current solutions still have a number of problems to be solved in memory consumption and/or computation complexity.
SUMMARY
[0002] In accordance with implementations of the subject matter described herein, there is provided a solution for training a neural network. In the solution, a fixed-point format is used to store parameters of the neural networks, such as weights and biases. The parameters are also known as primal parameters to be updated for each iteration. Parameters in the fixed-point format have a predefined bit-width and can be stored in a memory unit of a special-purpose processing device. The special-purpose processing device, when executing the solution, receives an input to a layer of a neural network, reads parameters of the layer from the memory unit, and computes an output of the layer based on the input of the layer and the read parameters. In this way, the requirements for the memory and computing resources of the special-purpose processing device can be reduced.
[0003] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] Fig. 1 illustrates a block diagram of a computing environment in which implementations of the subject matter described herein can be implemented;
[0005] Fig. 2 illustrates a block diagram of a neural network in accordance with an implementation of the subject matter described herein;
[0006] Fig. 3 illustrates an internal architecture for a forward pass of a convolutional layer of the neural network in accordance with an implementation of the subject matter described herein; [0007] Fig. 4 illustrates an internal architecture for a backward pass of a layer of the neural network in accordance with an implementation of the subject matter described herein;
[0008] Fig. 5 illustrates a flowchart of a method for training a neural network in accordance with an implementation of the subject matter described herein;
[0009] Fig. 6 illustrates a block diagram of a device for training a neural network in accordance with an implementation of the subject matter described herein;
[0010] Fig. 7 illustrates a block diagram of a forward pass of the neural network in accordance with one implementation of the subject matter described herein; and
[0011] Fig. 8 illustrates a block diagram of a backward pass of the neural network in accordance with one implementation of the subject matter described herein.
[0012] Throughout the drawings, the same or similar reference symbols refer to the same or similar elements.
DETAILED DESCRIPTION OF EMBODIMENTS
[0013] The subject matter described herein will now be discussed with reference to several example implementations. It would be appreciated these implementations are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement the subject matter described herein, rather than suggesting any limitations on the scope of the subject matter.
[0014] As used herein, the term "includes" and its variants are to be read as open terms that mean "includes, but is not limited to." The term "based on" is to be read as "based at least in part on." The term "one implementation" and "an implementation" are to be read as "at least one implementation." The term "another implementation" is to be read as "at least one other implementation." The terms "first," "second," and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below.
[0015] Recently, extensive studies have been focused on speeding up model training and inference using special purpose processing hardware, such as Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). Among these methods, model quantization has been considered to be one of the most promising approaches, because it not only significantly accelerates the speed and increases the power- efficiency, but also achieves comparable accuracy. Model quantization is intended to quantize the model parameters (as well as activations and gradients) to low bit-width values, while model binarization further pushes the limit of the quantization by extremely quantizing the parameters to be a binary value (a single bit, -1 and 1). As a result, during inference, memory footprint and memory accesses can be drastically reduced, and most arithmetic operations can be implemented with bit-wise operations, i.e., binary convolution kernel. However, it is required that the quantization solution be further improved so as to reduce memory footprint, to lower computation complexity, and so on.
[0016] Basic principles and various example implementations of the subject matter will now be described with reference to the drawings. It would be appreciated that, for the sake of clarity, the implementations of the subject matter described herein will be described with reference mainly to a convolutional neural network. In this way, a convolutional layer is described as an example of a representative layer of the neural network. However, it would be appreciated that this is not intended to limit the scope of the subject matter described herein. The idea and principles described herein are suitable for any suitable neural network system currently known or to be developed in the future.
Example Environment
[0017] Fig. 1 illustrates a block diagram of a computing device 100 in which implementations of the subject matter described herein can be implemented. It would be appreciated that the computing device 100 shown in Fig. 1 is merely illustration but not limiting the function and scope of the implementations of the subject matter described herein in any way. As illustrated in Fig. 1, the computing device 100 may include a memory 102, a controller 104, and a special-purpose processing device 106.
[0018] In some implementations, the computing device 100 can be implemented as various user terminals or service terminals with computing capability. The service terminals may be servers, large-scale computer devices, and other devices provided by various service providers. The user terminals, for example, are any type of mobile terminals, fixed terminals, or portable terminals, including mobile phones, stations, units, devices, multimedia computers, multimedia tablets, Internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, Personal Communication System (PCS) devices, personal navigation devices, Personal Digital Assistants (PDAs), audio/video players, digital camera/camcorders, positioning devices, television receivers, radio broadcast receivers, electronic book devices, game devices, or any combination thereof, including the accessories and peripherals of these devices or any combination thereof. It is also contemplated that the computing device 100 can support any type of interface to the user (such as "wearable" circuitry and the like).
[0019] The special-purpose processing device 106 may further include a memory unit 108 and a processing unit 110. For example, the special-purpose processing device 106 may be a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a processor or a Central Processing Unit (CPU) with a customized processing unit, or a Graphics Processing Unit (GPU). Therefore, the memory unit 108 may be referred to as a memory-on-chip and the memory 102 may be referred to as a memory-off-chip accordingly. In some implementations, the processing unit 110 can control the overall operations of the special-purpose processing device 106 and perform various computations.
[0020] The memory 102 may be implemented by various storage media, including but not limited to volatile and non-volatile medium, and removable and non-removable medium. The memory 102 can be a volatile memory (such as a register, cache, Random Access Memory (RAM)), a non-volatile memory (such as, a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory)), or any combinations thereof. The memory 102 may be removable and non-removable medium, and may include a machine-readable medium, such as a memory, flash drive, magnetic disk or any other media that can be used to store information and/or data and can be accessed by the computing device 100.
[0021] The controller 104 can control the start and end of the computing process and may further provide inputs required for forward pass of the convolutional neural network. In addition, the controller 104 can also provide the weight data for the neural network. The controller 104 communicates with the special-purpose processing device 106 via a standard interface such as a PCIe bus. The controller 104 assigns the computing tasks to the processing unit 110 on the special-purpose processing device 106. The processing unit 110 begins the computing process after receiving the start signal from the controller 104. The controller 104 provides the inputs and weights to the processing unit 110 for computation. The memory unit 108 of the special -purpose processing device 106 may be used to store parameters, such as convolution kernel weights, while the memory 102 may store input and output feature maps and intermediate data generated during computation. The special-purpose processing device 106 completes the computation of the forward pass of the neural network and then returns the output result obtained from the previous layer of the convolutional neural network to the controller 104. However, it would be appreciated that the above control process is merely exemplary. Those skilled in the art may change the control process after understanding the implementations of the subject matter described herein.
[0022] The computing device 100 or the special -purpose processing device 106 can perform the training of the neural networks in the implementations of the subject matter described herein. During the training of the neural network, model parameters, also known as primal parameters, are defined to be stored weights and biases. The parameters are updated during each iteration. In the prior art, the parameters are stored in high-resolution format. These parameters will be quantized or binarized every time before forward pass, and the associated gradient accumulation is still performed in a floating-point domain. Thus, the special-purpose processing device, such as the FPGA and the ASIC, still need to implement expensive floating-point multiplication-accumulation to handle parameter updates, and even much more expensive nonlinear quantization method.
[0023] In accordance with some implementations of the subject matter described herein, the limits of quantization are further pushed by representing the parameters in a fixed-point format, which can decrease the bit- width of the parameters, so as to dramatically reduce the total memory. For example, 8-bit fixed-point number can reduce the total memory space to a quarter compared with the 32-bit floating-point number. This makes it possible to store the parameters on the memory-on-chip of the special-purpose processing device rather than memory-off-chip. In the case of 45nm CMOS process node, it means 100 times energy efficiency. Moreover, fixed point arithmetic operations with low resolution in the special-purpose processing device are much faster and energy efficient than floating-point representation. Furthermore, fixed-point operations generally will dramatically reduce logic usage and power consumption, combined with higher clock frequency, shorter pipelines, and increased throughput capabilities.
Convolutional Neural Network
[0024] Convolutional neural network is a particular neural network, which usually includes a plurality of layers with each layer including one or more neurons. Each neuron obtains input data from the input of the neural network or the previous layer to perform respective operations, and outputs the results to the next layer or the output of the neural network model. The input of the neural network may be, for example, images, e.g., RGB images of particular pixels. In the classification problem, the output of the neural network is a score or a probability of a different class. The last layer (usually the fully-connected layer) of the neural network may be provided with a loss function, which can be a cross entropy loss function. During training of the neural network, it is generally required to minimize the loss function.
[0025] Structure of the convolutional neural network is specially designed for the situation with images as the input data. In this case, therefore, the convolutional neural network is highly efficient and the number of parameters required in the neural network is greatly reduced. [0026] In the convolutional neural network, each layer is arranged in three dimensions: width, height, and depth. Each layer of the convolutional neural network converts the three-dimensional input data to a three-dimensional activation data and outputs the converted three-dimensional activation data. The convolutional neural network includes various layers arranged in a sequence and each layer sends the activation data from one layer to another. The convolutional neural network mainly includes three types of layers: a convolutional layer, a pooling layer, and a fully-connected layer. By stacking the layers over each other, a complete convolutional neural network may be constructed.
[0027] Fig. 2 illustrates example architecture of a convolutional neural network (CNN) 200 in accordance with some implementations of the subject matter described herein. It should be understood that the structure and functions of the convolutional neural network 200 are described for illustration, and do not limit the scope of the subject matter described herein. The subject matter described herein can be implemented by different structures and/or functions.
[0028] As shown in Fig. 2, the CNN 200 includes an input layer 202, convolutional layers 204 and 208, pooling layers 206 and 210, and an output layer 212. Generally, the convolutional layer and the pooling layer are arranged alternately. For example, as shown in Fig. 2, the convolutional layer 204 is followed by the adjacent pooling layer 206 and the convolutional layer 208 is followed by the adjacent pooling layer 206. However, it would be appreciated that the convolutional layer may not be followed by a pooling layer. In some implementations, the CNN 200 only includes one of the pooling layers 206 and 210. In some implementations, there are even no pooling layers.
[0029] As described above, each of the input layer 202, the convolutional layers 204 and 208, the pooling layers 206 and 210, and the output layer 212 includes one or more planes, also known as feature maps or channels. The planes are arranged along the depth dimension and each plane may include two space dimensions, i.e., width and height, also known as space domain.
[0030] To help understand the ideas and principles of the subject matter described herein, the principle of the CNN 200 will now be described with reference to an example application of the image classification. It would be appreciated that the CNN 200 can be easily extended to any other suitable applications. Moreover, the input layer 202 may be represented by the input images, such as 32*32 RGB images. In this case, the dimension for the input layer 202 is 32*32*3. In other words, the width and height for the image is 32 and there are three color channels. [0031] Feature map of each of the convolutional layers 204 and 208 may be obtained by applying convolutional operations on the feature map of the previous layer. By convolutional operations, each neuron in the feature map of the convolutional layers is only connected to a part of neurons of the previous layer. Therefore, applying convolutional operations on the convolutional layers indicates the presence of sparse connection between these two layers. After applying convolutional operations on the convolutional layers, the result may be applied with an activation function to determine the output of the convolutional layers.
[0032] For example, in the convolutional layer 204, each neuron is connected to a local area in the input layer 202 and computes the inner product of the local area and its weights. The convolutional layer 204 may compute the output of all neurons and, if 12 filters (also known as convolution kernel) are used, the obtained output data will have a dimension of [32*32* 12]. In addition, activation operations may be performed on each output data in the convolutional layer 204 and the common activation functions include Sigmoid, tanh, ReLU, and so on.
[0033] The pooling layers 206 and 210 down sample the output of the previous layer in space dimension (width and height), so as to reduce data size in space dimension. The output layer 212 is usually a fully-connected layer, in which each neuron is connected to all neurons of the previous layer. The output layer 212 computes classification scores and converts data size to one-dimensional vectors, each element of the one-dimensional vectors corresponding to a respective category. For instance, regarding the convolutional network of the images in CIFAR-10 for classification, the last output layer has a dimension of 1 * 1 * 10 because the convolutional neural network will finally compress the images into one vector consisting of classification scores, where the vector is arranged along the direction of depth.
[0034] It can be seen that the convolutional neural network converts the images one by one from original pixel values to final classification score values. For example, when the convolutional layers and the fully-connected layer operate on the corresponding inputs, both the activation function and the learning parameters can be used, where parameters in the convolutional layers and the fully-connected layer can be optimized based on different optimization solutions. Examples of the optimization solutions include but not limited to stochastic gradient descent algorithm, adaptive momentum estimation (ADAM) method and the like. Therefore, the errors between the classification scores obtained by the convolutional neural network and labels of each image can be lowered as much as possible for the data in the training data set. [0035] Training of the neural network can be further implemented by a backward pass. In this method, the training set is input into the input layer of the neural network, e.g., input the training set into the input layer of the neural network in batches and iteratively update the parameters of the neural network by batch. Samples of each batch can be regarded as a mini-batch. After multiple iterations, all samples in the training set have been trained once, which is called as an epoch.
[0036] In each iteration, a plurality of inputs is grouped into a mini-batch, which is provided to the input layer. By the forward pass, the inputs are propagated layer by layer to the output layer of the neural network, so as to determine the output of the neural network, such as classification scores. The classification scores are then compared with the labels in the training set to compute prediction errors by the loss function, for example. The output layer discovers that the output is inconsistent with the right label. At this time, the parameters of the last layer in the neural network may be adjusted and then parameters of the last second layer connected to the last layer may be adjusted. Accordingly, the layers are adjusted layer by layer in a backward direction. When the adjustment for all parameters in the neural network is completed, the same process is performed on the next mini-batch. In this way, the process is performed iteratively until the predefined termination condition is satisfied.
Binary Neural Network
[0037] The following description introduces a binary neural network (BNN) into which the implementations of the subject matter described herein are applied. In BNN, weights and activations can be binarized to significantly speed up the performance by using the bit convolution kernels. In some implementations, the floating-point value is converted into a single bit by the stochastic method. Although the stochastic binarization can get better performance, the computation complexity of the solution is higher since it requires the hardware resources to generate random bits when quantizing. In some implementations, a deterministic method is adopted to convert the floating-point value into a single bit and the deterministic solution is lower in computation complexity. For example, a simple sign function sign (·) is used to binarize the floating-point value, as shown in equation (1).
Figure imgf000010_0001
[0038] As indicated in the equation (1), when the weigh w is great than or equal to zero, it is converted to +1, and when the weight w is less than zero, it is converted to -1, such that the obtained value wb is a one-bit binary number. Such binary conversion drastically reduces computation complexity and memory consumption in forward pass. However, the derivative of the sign function is 0 almost everywhere, which makes the gradients of the loss function c cannot be propagated in backward pass. To address this problem, a "straight-through estimator" STE method is employed, as shown in equation (2):
Forward : r0 = sign(ri)
, , dc dc„ (2)
Backward : — dri =— dr0 li i r' ,".i<-i1
[0039] In equation (2), l|ri|<i represents an indicator function, where when the input π satisfies the condition that |n|<l, the value of the indicator function is 1; when the input π satisfies the condition that |n|<l, the value of the indicator function is 0. Accordingly, the STE method preserves the gradient information and cancels the gradient information when π is too large. If the gradient information is not cancelled when n is too large, it may cause a significant drop in the performance of the model.
[0040] From another perspective, STE can also be regarded as applying hard-tanh activation function HT to n, where HT is defined as:
{ +1 n > L
ri r, [- 1, 1] : (3)
-1 r, < -I.
[0041] Correspondingly, the derivative of HT is defined as:
i 0 r, > I.
. . C- ,. ,4)
0 -1.
[0042] It can be seen that the STEs defined in equations (4) and (2) are exactly the same. With equation (3) and (4), the neural network can binarize both activations and weights during forward pass, while still reserving real-valued gradients to guarantee stochastic gradient descent to properly operate.
Fixed-Point Format
[0043] In accordance with implementations of the subj ect matter described herein, weights and gradients can be stored in a fixed-point format, e.g., weights can be stored in the memory unit 108 of the special-purpose processing device 106 in a fixed-point format. The fixed- point format includes a /-bit signed integer mantissa and a global scaling factor (e.g., 2"n) shared with the fixed-point values, as shown in equation (5):
(5) where n and mantissa mi-rrik are integers. [0044] It can be seen that the vector v includes K elements vi~vk, which share one scaling factor 2"n. Here the integer n actually indicates the radix point position of the /-bit fixed- point number. In other words, the scaling factor in fact refers to the position of the radix point. Because the scaling factor is usually fixed, i.e., fixed radix point, this data format is called as fixed-point number. Lowering the scaling factor will reduce the range of the fixed-point format but increases the accuracy of the fixed-point format. The scaling factor is usually set to be a power of 2, since the multiplication can be replaced by bit shift to reduce the complexity of computation.
[0045] In some implementations, the following equation (6) may be used to convert data x (such as, a floating-point number) into an /-bit fixed-point number with the scaling factor
2-n
I X I ' . !, n} Olipi | - O. j :Γ \ MIN. MA ) (6) where [·] means rounding down, and MIN and MAX respectively denote the minimum value and the maximum value of the /-bit fixed-point number with the scaling factor 2"n. In some implementations, to make the summing circuit and multiplication circuit simpler by taking full advantage of all ordinality 2 MIN and MAX are defined as follows:
f MIN ·· ϊ * r "
\ MAX =:= (2; " s - I) * 2~ " 7)
[0046] It is observed that equation (6) also defines the rounding behavior, i.e., indicated by the rounding down operation [·]. In addition, equation (6) also defines the saturating ί π
+ 0.5
behavior represented by Clip. In other words, when f-2' is greater than
MAX, the value of the converted fixed-point number is MAX; when
Figure imgf000012_0001
is less than MIN, the value of the converted fixed-point number is MIN.
[0047] In the following, the operations for converting data into fixed-point format may be implemented by equations (6) and (7), unless indicated otherwise. Of course, any other suitable conversion operations may also be adopted.
Quantization
[0048] It is known that magnitudes of weight, activation, and gradient will fluctuate during training, where the gradient fluctuation is most apparent. To match the fluctuations, different bit-widths and scaling factors are assigned to the parameters, activations, and gradients in different layers and the scaling factors of the parameters are updated accordingly during iteration. Moreover, different scaling factors can also be assigned to weights and biases among the parameters.
[0049] In some implementations of the subject matter described herein, the scaling factor may be updated based on the data range. Specifically, it may be determined, based on overflow of the data (e.g., overflow rate and/or overflow amount), whether to update the scaling factor and how to update the scaling factor. The method for updating the scaling factor will now be explained with reference to weights. However, it would be appreciated that the method can also be applied for other parameters.
[0050] In the case of the current scaling factor, it may be determined whether the overflow rate of the weights exceeds the predefined threshold. If the overflow rate exceeds the predefined threshold, the range of the fixed-point number is too small and the scaling factor should be increased accordingly. For example, the scaling factor may be multiplied with the cardinal number (e.g., 2). For example, the radix point may be shifted right by one bit. If the overflow rate does not exceed the predefined threshold and, after the weight is multiplied with 2, the overflow rate is still below the predefined threshold, the range of the fixed-point number is too large. Therefore, the scaling factor may be reduced, for example, by dividing the scaling factor by the cardinal number (e.g., 2). For example, the radix point may be shifted left by one bit.
[0051] Compared with binary weights and activations, gradients usually require higher accuracy. Therefore, quantization of the gradients should be carefully reviewed. Because the linear quantization approach does not converge well, a non-linear quantization function is employed instead to quantize the gradients. However, these non-linear quantization solutions will inevitably increase the computation overhead, which is undesirable. Hence, in accordance with implementations of the subject matter described herein, a linear quantization solution is adopted to lower the computation complexity. As described above, if the linear quantization function is simply used in the training of the neural network, it will introduce too strong regularization and impede convergence of the neural network model. However, in the case of using a solution for updating adaptive scaling factors, the linear quantization approach may be employed without causing non- convergence or dramatic decrease in model performance.
Forward Pass
[0052] Fig. 3 illustrates an internal architecture for a forward pass of a convolutional layer 300 of the convolutional neural network in accordance with an implementation of the subject matter described herein. The convolutional layer 300 may be a k-th layer of the neural network. For example, the convolutional layer 300 may be a convolutional layer 204 or 208 in the convolutional neural network as shown in Fig. 2. In Fig. 3, legend 10 represents binary numbers and legend 20 represents fixed-point numbers. It would be appreciated that, although Fig. 3 illustrates a plurality of modules or sub-layers, in specific implementations one or more sub-layers may be omitted or modified for different purposes.
[0053] As shown in Fig. 3, parameters of the convolutional layer 300 includes weights 302 and biases 304, respectively denoted as & and bk' , i.e., weights and biases of the k- th layer. In some implementations, parameters of the convolutional layer 300 may be represented and stored in fixed-point format instead of floating-point format. The parameters in fixed-point format may be stored in the memory unit 108 of the special - purpose processing device 106 and may be read from the memory unit 108 during operation.
[0054] In the forward pass, the weights 302 in fixed-point format are converted by a binary sub-layer 308 to binary weights 310, which may be represented by . For example, the binary sub-layer 308 may convert the fixed-point weights 302 into binary weights 310 by a sign function, as shown in equation (1). Moreover, the convolutional layer 300 further receives an input 306, which may be represented by x . For example, when the convolutional layer 300 is an input layer of the neural network (i.e., k=l), the input 306 can be input images of the neural network. In this case, the input 306 can be regarded as an 8- bit integer vector (0-225). In another case, when the convolutional layer 300 is a hidden layer or an output layer of the neural network, for example, the input 306 may be an output of the previous layer, which may be a binary vector (+1 or -1). In both cases, convolutional operation only includes integer multiplication and accumulation and may be computed by bit convolution kernels. In some implementations, if the convolutional layer 300 is the first layer, it may be processed according to equation (8),
Figure imgf000014_0001
where x represents an input 306 in an 8-bit fixed-point format, wb represents a binary weight and x11 represents the mantissa of the n-th element of vector x.
[0055] A normalization sub-layer 316 represents integer batch normalization (IBN) sublayer, which normalizes input tensor within a mini-batch with mean and variance. Different from conventional batch normalization performed in floating-point domain, all intermediate results involved in the sub-layer 316 are either 32-bit integers or low resolution fixed-point values. Since integer is a special fixed-point number, the IBN sub-layer 316 only includes corresponding fixed-point operations. Subsequently, the quantization sublayer 318 converts the output of the IBN sub-layer 316 to a predefined fixed-point format.
[0056] Specifically, for the IBN sub-layer 316, the input may be fixed-point input in a mini-batch ¾» =ϊ¾· · · · *} , including N elements. To obtain normalized output - i Sii 1 , - - - f ^ me sum su-n - -· ^\' ··. ancj sum of squares s,tm2 ί ' of all inputs can be determined. Then, the mean value mean <····· Round (sum 1 / N ) ancj the variance var
Figure imgf000015_0001
inpUt are computed based on suml and sum 2, wherein Round ( ) means rounding to the nearest 32-bit integer. Then, the normalized output ¾'>: <~ ( ' < - mean) / Romid( ^var ) 1S determined based on the mean and the variance. The normalized output can be converted to ^ " * - irX})^) in a predefined fixed- point format via the sub-layer 318.
[0057] For the output of the IBN sub-layer 316, the method for updating scaling factors described in the Quantization section above can be used to update the scaling factors. For example, it may be first determined whether the overflow rate of the IBN output exceeds the predefined threshold. If the overflow rate is greater than the predefined threshold, the range of the IBN output is extended, that is, increasing the scaling factor or right shifting the radix point in fixed-point format when the cardinal number is 2. This will not be repeated because it is substantially the same as the method for updating scaling factors described with reference to quantization.
[0058] In some implementations, a summing sub-layer 320 adds the output of the IBN sub-layer 136 with the bias 304 to provide an output Sk. The bias 304 may be read from the memory unit 108 of the special-purpose processing device 106. The activation sublayer 322 represents an activation function, which is usually implemented by a non-linear activation function, e.g., hard-tanh function HT. The output of the activation sub-layer 322 is converted via the quantization sub-layer 324 to an output 326 in a fixed-point format, which is denoted by X +1, to be provided to the next layer (k+1 layer) of the neural network. Moreover, the last layer of the neural network may not include the activation sub-layer 322 and binary layer 324, i.e., the loss function layer is computed in a floating-point domain.
[0059] In some implementations, a pooling layer is located after the convolutional layer 300. For example, as shown in Fig. 2, both of the convolutional layers 204 and 208 are followed by a pooling layer 206 in the convolutional neural network 200. In this case, the pooling layer may be incorporated into the convolutional layer 300 to further reduce computation complexity. For example, the pooling layer 206 is incorporated into the convolutional layer 204 in the convolutional neural network 200. As shown in Fig. 3, the pooling sub-layer 314 indicated by the dotted line may be incorporated into the convolutional layer 300 and arranged between the convolutional sub-layer 321 and the IBN sub-layer 316.
[0060] The above description introduces the forward pass with reference to a convolutional layer 300. It would be appreciated that the forward pass of the entire neural network may be stacked by a plurality of similar processes. For example, the output of the k-th layer is provided to the k+1 layer to serve as an input of the k+1 layer; and the process continues layer by layer. In the convolutional neural network 200 of Fig. 2, the output of the convolutional layer 204 is determined from the architecture of the convolutional layer 300 (without the sub-layer 314). If the pooling layer 206 is incorporated into the convolutional layer 204, the output of the pooling layer 206 may be determined by the architecture of the convolutional layer 300 (including the sub-layer 314). Then, the output is provided to the convolutional layer 208 and the classification category is provided at the output layer 212.
Backward Pass
[0061] Fig. 4 illustrates an internal architecture for a backward pass of a convolutional layer 400 of the convolutional neural network in accordance with an implementation of the subject matter described herein. The backward pass process is shown in Fig. 4 from right to left. In Fig. 4, legend 30 represents floating-point number and legend 20 represents fixed-point number. It would be appreciated that, although the forward pass and backward pass process of the convolutional layer is respectively indicated by the signs 300 and 400, the convolutional layers 300 and 400 may refer to the same layer in the neural network. For example, the convolutional layers 300 and 400 may be the architecture for implementing the forward pass and backward pass of the convolutional layer 204 or 208 in the convolutional neural network 200. It would be further appreciated that, although Fig. 4 illustrates a plurality of modules or sub-layers, each sub-layer can be omitted or modified in specific implementations for different purposes in view of different situations.
[0062] As shown in Fig. 4, during backward pass, the convolutional layer 400 receives a backward input 426 from a next layer of the neural network, e.g., if the convolutional layer 400 is a k-th layer, the convolutional layer 400 receives a backward input 426 from the k+1- th layer. The backward input 426 may be a gradient of the loss function with respect to the forward output 326 of the convolutional layer 300. The gradient may be in floating-point format and may be represented as g h .
xk+i
[0063] The backward input 426 is converted to a fixed-point value 430 (denoted by gfx )
xk+i by the quantization sub-layer 424. The activation sub-layer 422 computes its output based on the fixed-point value 430, i.e., the gradient of the loss function with respect to the input sk of the activation sub-layer 322, denoted by
Figure imgf000017_0001
[0064] It would be appreciated that most of the sub-layers in Fig. 4 corresponds to the sublayers shown in Fig. 3. For example, the activation sub-layer 322 in Fig. 3 corresponds to the activation sub-layer 422 in Fig. 4, which serves as a backward gradient operation for the activation sub-layer 322. If the input of the activation sub-layer 322 is x, the output thereof is y, the backward input of the corresponding activation sub-layer 422 is a gradient of the loss function with respect to the output y and the backward output is a gradient of the loss function with respect to the input x. In Fig. 3, if the hard-tanh function serves as the activation function, the operations performed by the activation sub-layer 322 are shown in equation (3). Accordingly, the operations performed by the activation sub-layer 422 are shown in equation (4). Therefore, in the context of the subj ect matter described herein, the names for the two types of sub-layers are not distinguished from each other.
[0065] The backward output of the activation sub-layer 422 is provided to the summing sub-layer 420, which corresponds to the summing sub-layer 320, and the gradients of the loss function with respect to two inputs of the summing sub-layer 320 may be determined. Because an input of the sub-layer 320 is the bias, the gradient of the loss function with respect to the bias may be determined and the gradient is provided to the quantization sublayer 428. Subsequently, the gradient is converted to a fixed-point gradient by the quantization sub-layer 428 for updating the bias 404 (represented by fe ). The fixed- point format has a specific scaling factor, which may be updated in accordance with the method for updating scaling factors as described in the Quantization section above.
[0066] Another backward output of the summing sub-layer 420 is propagated to the IBN sub-layer 418. In forward pass, a fixed-point format is used to compute the IBN sub-layer 418. However, if the same method is applied in the backward pass and the backward propagation of IBN is restricted in fixed-point representation, non-negligible accuracy degradation will occur. In some implementations, therefore, the IBN sub-layer 418 is returned to the floating-point domain for operations, so as to provide an intermediate gradient output. As shown in Fig. 4, the intermediate gradient output is a gradient of the loss function with respect to the convolution of the input and parameters. Hence, an additional quantization sub-layer 416 is utilized after the IBN sub-layer 418 for converting the floating-point format into a fixed-point format. The quantization sub-layer 416 converts the intermediate gradient output to a fixed-point format having a specific scaling factor, which may be updated according to the method for updating scaling factors as described in the Quantization section above.
[0067] The convolutional sub-layer 412 further propagates a gradient gwb of the loss function with respect to the weight WP and a gradient g b of the loss function with
xk
respect to an output
Figure imgf000018_0001
of the convolutional layer. Because the input X% is either an 8- bit integer vector (for the first layer, i.e., k=l) or a binary vector (for other layers, i.e., k≠l) and the weight w£ is a binary vector, the convolutional sub-layer 412 only contains fixed- point multiplication and accumulation, thereby resulting in a very low computation complexity.
[0068] The backward output gvb of the convolutional sub-layer 412 provides a backward
xk
output 406 of the convolutional layer 400 to a previous layer. The backward output gwb of the convolutional sub-layer 412 is converted to a fixed-point format via the quantization layer 408 to update the weight 402 (represented by w xp). The fixed-point format has a specific scaling factor, which may be updated according to the method for updating scaling factors as described in the Quantization section above.
[0069] After determining the gradient of the loss function with respect to the parameters by the backward pass, the parameters may be updated. As described above, the parameter may be updated by various updating rules, e.g., stochastic gradient descent, Adaptive Momentum Estimation (ADAM), or the like. In some implementations, the updating rules are performed in the fixed-point domain to further reduce floating-point computation. It would be appreciated that, although reference is made to the ADAM optimization method, any other suitable methods currently known or to be developed in the further may also be implemented.
[0070] ADAM method dynamically adjusts the learning rate for each parameter based on a first moment estimate and a second moment estimate of the gradient of the loss function with respect to each parameter. Fixed-point ADAM optimization method differs from the standard ADAM optimization method in that the fixed-point ADAM method operates entirely within the fixed-point domain. In other words, its intermediate variables (e.g., first moment estimate and second moment estimate) are represented by fixed-point numbers.
To be specific, one fixed-point ADAM learning rule is represented by the following equation
(9), which converts the standard ADAM updating rules to a fixed-point format.
Figure imgf000019_0001
In equation (9), gf denotes element-by-element square #ί 0#*. For the sake of simplicity, and ' ¾ are respectively fixed to be and ' FXP( ) represents a function of equation (6). The default settings are i ~ .¾ - 2- i - .¾ = - ^ and <::: 2 ""aj . The parameter represents the current fixed-point parameter value with a fixed-point format h, m, and ¾ represents the updated fixed-point parameter value. The fixed-point format for the gradient gt is h and «2, and ¾ is the learning rate. It can be seen that the ADAM method computes the updated parameters by calculating the intermediate variables mt, vt, and tit, and only includes respective fixed-point operations.
[0071] By the fixed-point ADAM method, the updated weight V fc fxp and bias b xp can be computed. As described above, these parameters can be stored in a memory unit 108 of the special-purpose processing device 106 in a fixed-point format. In addition, the scaling factors for the fixed-point format of the parameters may also be updated as described above. The scaling factors may be updated according to the method for updating scaling factors as described in the Quantization section above.
[0072] Additionally, if the pooling layer is incorporated into the convolutional layer 300 to serve as its pooling sub-layer 314 in the forward pass, a corresponding pooling layer should be incorporated into the convolutional layer 400 to serve as its pooling sub-layer 414 in the backward pass.
[0073] It can be seen that in the architecture shown in Figs. 3 and 4, at most two portions are implemented by floating-point numbers. The first portion is the loss function and the second portion is a backward pass of the gradient in the IBN sub-layer 418. Therefore, the floating-point computations are avoided as much as possible to lower computation complexity and reduce memory space.
[0074] Additionally, in the architecture shown in Figs. 3 and 4, the quantization sub-layer may be implemented by a linear quantization method, and an adaptive updating method for scaling factors of the fixed-point parameters corresponding to the quantization sub-layer may be used to ensure that no significant drop will occur in accuracy. The linear quantization method can greatly lower computation complexity, which can further facilitate the deployment of the convolutional neural network on the special-purpose processing device.
[0075] The backward pass process has been introduced above with reference to a convolutional layer 400. It would be appreciated that the backward pass of the entire neural network can be stacked by a plurality of similar processes. For example, the backward output of the k+l-th layer is provided to the k-th layer to serve as a backward input of the k-th layer; and the parameter of each layer is updated layer by layer. In the convolutional neural network 200 of Fig. 2, if the convolutional layer 204 and the pooling layer 206 are combined for implementation, the backward output of the convolutional layer 204 can be determined by the architecture of the convolutional layer 300 (including a sublayer 314). Then, the backward output is provided to the input layer 202, to finally finish updating all parameters of the neural network 200, thereby completing an iteration of a mini- batch. Iteratively completing iterations of all mini-batches in the training set may be referred to as finishing a full iteration of the data set, which is also known as epoch. After a plurality of epochs, if the training result satisfies the predefined threshold condition, the training is complete. For example, the threshold condition can be a predefined number of epochs or a predefined accuracy.
[0076] Additionally, it would be appreciated that it is not necessary to apply the adaptive updating method in each iteration. For example, the adaptive updating method may be performed once after a plurality of iterations. Moreover, the frequency for applying the adaptive updating method may vary for different quantities. For example, the adaptive updating method may be applied more frequently for the gradients, because the gradients tend to fluctuation more extensively.
Model Training
[0077] Fig. 5 illustrates a flowchart of a method 500 for a convolutional neural network in accordance with implementations of the subject matter described herein. The method 500 may be performed on the special-purpose processing device 106 as shown in Fig. 1. As described above, the special-purpose processing device 106 may be an FPGA or ASIC, for example.
[0078] At 502, an input to a convolutional layer of the neural network is received. As described above, the input may be received from the previous layer, or may be an input image for the neural network. The input may correspond to samples of a mini-batch in the training set. [0079] At 504, parameters of the convolutional layer are read from a memory unit 108 of the special-purpose processing device 106, where the parameters are stored in the memory unit 108 of the special -purpose processing device 106 in a first fixed-point format and have a predefined bit-width. The parameters may represent either weight parameters or bias parameters of the convolutional layer, or may represent both the weight parameters and the bias parameters. Generally, the bit-width of the first fixed-point format is smaller than the floating-point number to reduce the memory space of the memory unit 108.
[0080] At 506, the output of the convolutional layer is computed by fixed-point operations based on the input of the convolutional layer and the read parameters. In some implementations, the convolutional operations may be performed on the input and the parameters of the convolutional layer to obtain an intermediate output, which is normalized to obtain a normalized output. The normalization only includes respective fixed-point operations. For example, the normalization may be implemented by the IBN layer 316 as shown in Fig. 3.
[0081] In some implementations, in order to reduce the bit-width of the first fixed-point format while maintaining the model accuracy, the scaling factors of the parameters above are adaptively updated. For example, a backward input to the convolutional layer is received at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. Based on the backward input, the gradient of the loss function of the neural network with respect to parameters of the convolutional layer is computed. The parameters in the first fixed-point format may be updated based on the gradient of the loss function of the neural network with respect to parameters. The scaling factors of the first fixed-point format may be updated based on the updated parameter range. For example, the fixed-point format of the parameters may be updated by the method described above with reference to quantization.
[0082] The updated parameters may be stored in the memory unit 108 of the special- purpose processing device 106 to be read at the next iteration. In addition, it is not necessary to update the parameter format in each iteration. Instead, the fixed-point format of the parameters may be updated at a certain frequency. In some implementations, updating parameters only include respective fixed-point operations, which may be implemented by a fixed-point ADAM optimization method, for example.
[0083] In some implementations, the gradient of the loss function with respect to the parameters may be first converted to a second fixed-point format for updating parameters in the first fixed-point form. The first fixed-point format may be identical to or different from the second fixed-point format. The conversion method can be carried out by a linear quantization method. In other words, the gradient of the loss function of the neural network with respect to parameters may be converted to the second fixed-point format by a linear quantization method. Then, the parameters in the first fixed-point format may be updated based on the gradient in the second fixed-point format. In some implementations, the scaling factors of the second fixed-point format may be updated based on the range of the gradient of the loss function with respect to the parameters. As described above, the linear quantization method has a lower computation complexity and the performance will not be substantially degraded, because the scaling factor updating method is employed in the implementations of the subject matter described herein.
[0084] In some implementations, computing the output of the convolutional layer further comprises: converting the normalized output to a normalized output in a third fixed-point format, where the scaling factors of the third fixed-point format may be updated based on the range of the normalized output in the third fixed-point format. As shown in Fig. 3, the output of the IBN sub-layer 316 may be provided to the quantization layer 318, which can convert the normalized output of the IBN sub-layer 316 to a normalized output in a second fixed-point format. The scaling factors of the second fixed-point format can be updated depending on various factors. For example, the updating method may be configured to be carried out after a given number of iterations, which updating method may be the method described in the Quantization section above.
[0085] In some implementations, the method further comprises: receiving a backward input to the convolutional layer at the output of the convolutional layer, which backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. Then, the intermediate backward output is obtained based on the normalized backward gradient operations. In other words, the gradient of the loss function with respect to the convolution above is computed based on the backward input. For example, the backward gradient operations of the IBN gradient sub-layer 416 corresponds to normalization of the IBN sub-layer 416 as shown in Fig. 4. The backward gradient operations can be performed on the IBN gradient sub-layer 416 to get an intermediate backward output. Subsequently, the intermediate backward output is converted to a fourth fixed-point format and the scaling factors of the fourth fixed-point format can be updated based on the range of the intermediate backward output. For example, the scaling factors of the fourth fixed-point format may be updated according to the updating method described above with reference to quantization.
[0086] It would be appreciated that, although the method 500 describes one convolutional layer, the training process of the entire neural network may be stacked by the process of method 500 as described above with reference to Figs. 3 and 4.
Another Example Implementations of Special-Purpose Processing Device
[0087] Fig. 1 illustrates an example implementation of the special-purpose processing device 106. In the example of Fig. 1, the special-purpose processing device 106 includes a memory unit 108 for storing parameters of the neural network, and a processing unit 110 for reading the stored parameters from the memory unit 108 and using the parameters to process the input.
[0088] Fig. 6 illustrates a block diagram of a further example implementation of the special-purpose processing device 106. As described above, the special-purpose processing device 106 may be an FPGA or ASIC, for example.
[0089] In this example, the special-purpose processing device 106 includes a memory module 602 configured to store parameters of the convolutional layer of the neural network in a first fixed-point format, where the parameters in the first fixed-point format have a predefined bit-width. It would be appreciated that the memory module 602 is similar to the memory unit 108 of Fig. 1 in terms of functionality and both of them may be implemented using the same or different techniques or processes. Generally, the bit-width of the first fixed-point format is smaller than the floating-point numbers to reduce memory space of the memory module 602.
[0090] The special-purpose processing device 106 further includes an interface module 604 configured to receive an input to the convolutional layer. In some implementations, the interface module 604 may be used for processing various inputs and outputs between various layers of the neural network. The special-purpose processing device 106 further includes a data access module 606 configured to read parameters of the convolutional layer from the memory module 602. In some implementations, the data access module 606 may interact with the memory module 602 to process the access to the parameters of the neural network. The special-purpose processing device 106 may further include a computing module 608 configured to compute, based on the input of the convolutional layer and the read parameters, the output of the convolutional layer by a fixed-point operation.
[0091] In some implementations, the interface module 604 is further configured to receive a backward input to the convolutional layer at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. In addition, the computing module 608 is further configured to compute a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer based on the backward input; and update parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters, where the scaling factors of the first fixed- point format can be updated based on the range of the updated parameters.
[0092] In some implementations, updating parameters only includes respective fixed-point operations.
[0093] In some implementations, the computing module 608 is further configured to convert the gradient of the loss function of the neural network with respect to the parameters to a second fixed-point format by a linear quantization method, where the scaling factors of the second fixed-point format can be updated based on the gradient of the loss function with respect to the parameters; and update the parameters based on the gradient in the second fixed-point format.
[0094] In some implementations, the computing module 608 is further configured to normalize a convolution of the input of the convolutional layer and the parameters to obtain a normalized output, where the normalization only includes respective fixed-point operations.
[0095] In some implementations, the computing module 608 is further configured to convert the normalized output to a normalized output in a third fixed-point format, where the scaling factors of the third fixed-point format can be updated based on the range of the normalized output in the third fixed-point format.
[0096] In some implementations, the interface module 604 is further configured to obtain a backward input to the convolutional layer at the output of the convolutional layer, where the backward input is a gradient of the loss function of the neural network with respect to the output of the convolutional layer. Additionally, the computing module 608 may be configured to compute the gradient of the loss function with respect to the convolution based on the backward input; and convert the gradient of the loss function with respect to the convolution to a fourth fixed-point format, where the scaling factor of the fourth fixed-point format can be updated based on the range of the gradient of the loss function with respect to the convolution.
Testing and Performance
[0097] The following section will introduce the important factors that affect the final prediction accuracy of the training model of the neural network in accordance with implementations of the subject matter described herein. The factors comprise the batch normalization (BN) scheme, bit-width of the primal parameters, and bit-width of gradients. The effects of the factors are evaluated in turn by applying them separately on the binary neural network (BNN). Finally, all these factors are combined to obtain a neural network model.
[0098] In the following test, a data set CIRFA-30 is used, where the data set CIRFA-30 is an image classification benchmark with 60K 32x32 RGB tiny images. It consists of 10 classes object, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. Each class has 5K training images and IK test images. To evaluate the model fitting capability and training efficiency, three networks with different size are designed by stacking basic structural modules of the neural network shown in Figs. 3 and 4, including Model-S (Small), Model-M (Medium) and Model-L (Large). The overall network structure is illustrated in Figs. 7 and 8.
[0099] Fig. 7 illustrates a block diagram of a forward pass of the convolutional neural network 700 in accordance with an implementation of the subject matter described herein and Fig. 8 illustrates a block diagram of a backward pass of the convolutional neural network 800 in accordance with an implementation of the subject matter described herein.
[00100] In the convolutional neural networks 700 and 800, all convolution kernels are 3 x3, and the number of output channels in the first convolutional layer is 32, 64, and 128. Table 1 lists the number of parameters and the number of multiply-accumulate operations (MACs) in the three models. In Figs. 7 and 8, "x2 (4 or 8)" in layer C21 means the number of output channels in layer C21 is two times (4X or 8X) as much as the number in layers CI 1 and C12. Additionally, S represents same padding, V represents valid padding, MP indicates maximum pooling, C indicates convolutional layer, and FC indicates the fully- connected layer. The specific structure of each layer in Figs. 7 and 8 is not shown, which can be inspected from Figs. 3 and 4. It is noted that the loss function layer is computed in the floating-point domain in both forward pass and backward pass.
Table 1
Figure imgf000025_0001
[00101] In all of the experiments, 50K training images and a mini-batch size of 200 are given. Additionally, there are 37,500 iterations and 150 epochs in total. Because each epoch means one training that uses all samples in the training set and each iteration uses samples of one batch for training, each epoch has 250 iterations accordingly. Furthermore, in the experiments, by using the fixed-point ADAM optimization method or standard ADAM optimization method and setting the initial learning rate as 2"6, the learning rate will be decreased by a factor of 2"4 every 50 epochs.
[00102] Now the effect of different normalization schemes on prediction accuracy is evaluated, including standard floating-point BN and IBN output of different bit-widths. Here the primal parameters and all gradients are kept in the floating-point format and the standard ADAM algorithm is used to optimize the network. Note that the scaling factor updating algorithm described above will be performed on the IBN output every 1,125 iterations (3% of total iterations), and the threshold of the scaling factor updating algorithm is set to be 0.01%.
[00103] After testing, the accuracy loss of the neural network is quite stable with respect to the bit-width of the IBN output, as low as 6 bits. If the bit-width of the IBN output continues to decrease, the accuracy will suffer a cliff-off drop.
[00104] To evaluate the effects resulted from bit-width of storage parameters, experiments are conducted with floating-point gradients. In this case, the standard ADAM algorithm is used to update the parameters and the updated parameters are stored in a fixed-point format. The testing shows that 8-bit parameters are sufficient for maintaining performance and the bit-width lower than 8-bit will bring significant accuracy loss. Furthermore, the scaling factors are updated to maintain the values within a normal range. On the contrary, static scaling factor imposes too strong regularization on model parameters and fails to converge when the bit-width is lower than 8-bit.
[00105] Furthermore, the effect of the gradient bit-width is also evaluated. The gradients are more unstable than the parameters, which shows that the scaling factors of the gradients should be updated more frequently. In some implementations, the update occurs every 375 iterations (1% of total iterations) and the fixed-point ADAM method is employed. In the testing, the primal parameters are set with floating-point values. It can be seen from the testing that the prediction accuracy decreases very slowly when the bit-width of the gradient is reduced. The prediction accuracy also suffers a cliff-off drop when the bit-width of the gradient is lower than 12 bits, which is similar to the effect of the IBN output and the parameter bit-width. Therefore, a cliff-off drop will occur when the IBN output, parameter bit-width, and the gradient bit-width is lower than the threshold. [00106] The test is performed by combining the three effects, i.e., the neural network is implemented to substantially involve fixed-point computations only. In this way, the result in Table 2 can be obtained.
Table 2
Figure imgf000027_0001
[00107] Because the parameters are stored in a memory-on-chip (e.g., memory unit 108) of the special-purpose processing device 106, the relative storage is characterized by a product of the parameter number and the bits of the primal weight. It can be seen from Table 2 that a comparable accuracy with a larger bit-width can be obtained when the bit-width of the primal weight is 12 and the bit-width of the gradient is also 12. As the weight bit-width decreases, the storage will be substantially decreased. Therefore, the training solution for the neural network according to implementations of the subject matter described herein can lower the storage while maintaining computation accuracy.
[00108] As shown in Table 2, the method can achieve comparable result with the state-of- art works (not shown) when the bit-width of each of the primal weight and the gradient is 12. However, compared with the prior art, the method dramatically reduces the storage and significantly improves system performance.
Example Implementations
[00109] Several example implementations of the subject matter described herein are listed below.
[00110] In accordance with implementations of the subject matter described herein, there is provided a special-purpose processing device. The special-purpose processing device comprises a memory unit configured to store parameters of a layer of a neural network in a first fixed-point format, the parameters in the first fixed-point format having a predefined bit-width; a processing unit coupled to the memory unit and configured to perform acts including: receiving an input to the layer; reading the parameters of the layer from the memory unit; and computing, based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
[00111] In some implementations, the layer of the neural network includes a convolutional layer.
[00112] In some implementations, the acts further include: receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
[00113] In some implementations, updating the parameters only include a respective fixed- point operation.
[00114] In some implementations, updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises: converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed- point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
[00115] In some implementations, computing the output of the layer comprises: normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
[00116] In some implementations, computing the output of the convolutional layer further comprises: converting the normalized output into the normalized output in a third fixed- point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
[00117] In some implementations, the acts further include: obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function with respect to the convolution; and converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
[00118] In some implementations, the special -purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having a customized processing unit, or a graphics processing unit (GPU).
[00119] In accordance with implementations of the subject matter described herein, there is provided a method executed by a special-purpose processing device including a memory unit and a processing unit. The method comprises receiving an input to a layer of a neural network; reading parameters of the layer from the memory unit of the special-purpose processing device, the parameters being stored in the memory unit in a first fixed-point format and having a predefined bit-width; and computing, by the processing unit and based on the input of the layer and the read parameters, an output of the layer through a fixed- point operation.
[00120] In some implementations, the layer of the neural network includes a convolutional layer.
[00121] In some implementations, the method further comprises: receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
[00122] In some implementations, updating the parameters only include a respective fixed- point operation.
[00123] In some implementations, updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises: converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed- point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
[00124] In some implementations, computing the output of the layer comprises: normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
[00125] In some implementations, computing the output of the convolutional layer further comprises: converting the normalized output into the normalized output in a third fixed- point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
[00126] In some implementations, the method further comprises: obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; computing, based on the backward input, a gradient of the loss function with respect to the convolution; and converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
[00127] In some implementations, the special-purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having customized processing units or a graphics processing unit (GPU).
[00128] In accordance with implementations of the subject matter described herein, there is provided a special-purpose processing device. The special-purpose processing device comprises: a memory module configured to store parameters of a layer of a neural network in a first fixed-point format, the parameters in the first fixed-point format having a predefined bit-width; an interface module configured to receive an input to the layer; a data access module configured to read the parameters of the layer from the memory module; and a computing module configured to compute, based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
[00129] In some implementations, the layer of the neural network includes a convolutional layer.
[00130] In some implementations, the interface module is further configured to receive a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer; the computing module is further configured to: compute, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, and update the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
[00131] In some implementations, updating the parameters only include a respective fixed- point operation.
[00132] In some implementations, the computing module is further configured to: convert the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed-point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and update the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
[00133] In some implementations, the computing module is further configured to: normalize a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
[00134] In some implementations, the computing module is further configured to: convert the normalized output into the normalized output in a third fixed-point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
[00135] In some implementations, the interface module is further configured to: obtain a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer; compute, based on the backward input, a gradient of the loss function with respect to the convolution; and convert the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
[00136] In some implementations, the special-purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having customized processing units or a graphics processing unit (GPU).
[00137] The above described functions in the text can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, examples types of hardware logic components that can be used include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Product (ASSP), System-on-a-chip systems (SOC), Complex Programmable Logic Devices (CPLD), and the like.
[00138] Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub- combination.
[00139] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter specified in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A special-purpose processing device, comprising:
a memory unit configured to store parameters of a layer of a neural network in a first fixed-point format, the parameters in the first fixed-point format having a predefined bit- width;
a processing unit coupled to the memory unit and configured to perform acts including:
receiving an input to the layer;
reading the parameters of the layer from the memory unit; and
computing, based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
2. The special-purpose processing device of claim 1, wherein the layer includes a convolutional layer, and wherein the acts further include:
receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer;
computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and
updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
3. The special-purpose processing device of claim 2, wherein updating the parameters only include a respective fixed-point operation.
4. The special-purpose processing device of claim 2, wherein updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises:
converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed-point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and
updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
5. The special-purpose processing device of claim 1, wherein the layer includes a convolutional layer and computing the output of the layer comprises:
normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
6. The special-purpose processing device of claim 5, wherein computing the output of the convolutional layer further comprises:
converting the normalized output into the normalized output in a third fixed-point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
7. The special-purpose processing device of claim 5, wherein the acts further include:
obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer;
computing, based on the backward input, a gradient of the loss function with respect to the convolution; and
converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
8. The special-purpose processing device of claim 1, wherein the special -purpose processing device is a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a processor having a customized processing unit, or a graphics processing unit (GPU).
9. A method executed by a special-purpose processing device including a memory unit and a processing unit, the method comprising:
receiving an input to a layer of a neural network;
reading parameters of the layer from the memory unit of the special-purpose processing device, the parameters being stored in the memory unit in a first fixed-point format and having a predefined bit-width; and
computing, by the processing unit and based on the input of the layer and the read parameters, an output of the layer through a fixed-point operation.
10. The method of claim 9, wherein the layer includes a convolutional layer, and the method further comprises: receiving a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of a loss function of the neural network with respect to the output of the convolutional layer;
computing, based on the backward input, a gradient of the loss function of the neural network with respect to the parameters of the convolutional layer; and
updating the parameters in the first fixed-point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer, a scaling factor of the first fixed-point format being updatable based on a range of the updated parameters.
11. The method of claim 10, wherein updating the parameters only include a respective fixed-point operation.
12. The method of claim 10, wherein updating the parameters in the first fixed- point format based on the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer comprises:
converting the gradient of the loss function of the neural network with respect to the parameters of the convolutional layer into a second fixed-point format by a linear quantization method, the scaling factor of the second fixed-point format being updatable based on a range of the gradient of the loss function with respect to the parameters of the convolutional layer; and
updating the parameters in the first fixed-point format based on the gradient in the second fixed-point format.
13. The method of claim 9, wherein the layer includes a convolutional layer and computing the output of the layer comprises:
normalizing a convolution of the input of the convolutional layer and the parameters in the first fixed-point format to obtain a normalized output, the normalizing only including a respective fixed-point operation.
14. The method of claim 13, wherein computing the output of the convolutional layer further comprises:
converting the normalized output into the normalized output in a third fixed-point format, a scaling factor of the third fixed-point format being updatable based on a range of the normalized output in the third fixed-point format.
15. The method of claim 13, further comprising:
obtaining a backward input to the convolutional layer at an output of the convolutional layer, the backward input being a gradient of the loss function of the neural network with respect to the output of the convolutional layer;
computing, based on the backward input, a gradient of the loss function with respect to the convolution; and
converting the gradient of the loss function with respect to a convolution into a fourth fixed-point format, a scaling factor of the fourth fixed-point format being updatable based on a range of the gradient of the loss function with respect to the convolution.
PCT/US2018/014303 2017-01-25 2018-01-19 Neural network based on fixed-point operations WO2018140294A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710061333.9A CN108345939B (en) 2017-01-25 2017-01-25 Neural network based on fixed-point operation
CN201710061333.9 2017-01-25

Publications (1)

Publication Number Publication Date
WO2018140294A1 true WO2018140294A1 (en) 2018-08-02

Family

ID=61569403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/014303 WO2018140294A1 (en) 2017-01-25 2018-01-19 Neural network based on fixed-point operations

Country Status (2)

Country Link
CN (1) CN108345939B (en)
WO (1) WO2018140294A1 (en)

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165736A (en) * 2018-08-08 2019-01-08 北京字节跳动网络技术有限公司 Information processing method and device applied to convolutional neural networks
CN109284761A (en) * 2018-09-04 2019-01-29 苏州科达科技股份有限公司 A kind of image characteristic extracting method, device, equipment and readable storage medium storing program for executing
CN109800877A (en) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 Parameter regulation means, device and the equipment of neural network
CN110110852A (en) * 2019-05-15 2019-08-09 电科瑞达(成都)科技有限公司 A kind of method that deep learning network is transplanted to FPAG platform
US20190279072A1 (en) * 2018-03-09 2019-09-12 Canon Kabushiki Kaisha Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
EP3617954A1 (en) * 2018-08-22 2020-03-04 INTEL Corporation Iterative normalization for machine learning applications
CN110929838A (en) * 2018-09-19 2020-03-27 杭州海康威视数字技术股份有限公司 Bit width localization method, device, terminal and storage medium in neural network
WO2020063715A1 (en) * 2018-09-26 2020-04-02 Huawei Technologies Co., Ltd. Method and system for training binary quantized weight and activation function for deep neural networks
CN110969217A (en) * 2018-09-28 2020-04-07 杭州海康威视数字技术股份有限公司 Method and device for processing image based on convolutional neural network
EP3640858A1 (en) * 2018-10-17 2020-04-22 Samsung Electronics Co., Ltd. Method and apparatus for quantizing parameters of neural network
US20200126185A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Artificial intelligence (ai) encoding device and operating method thereof and ai decoding device and operating method thereof
WO2020080827A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Ai encoding apparatus and operation method of the same, and ai decoding apparatus and operation method of the same
CN111144560A (en) * 2018-11-05 2020-05-12 杭州海康威视数字技术股份有限公司 Deep neural network operation method and device
CN111144564A (en) * 2019-12-25 2020-05-12 上海寒武纪信息科技有限公司 Device for training neural network and integrated circuit board card thereof
CN111353517A (en) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN111368978A (en) * 2020-03-02 2020-07-03 开放智能机器(上海)有限公司 Precision improving method for offline quantization tool
EP3686808A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Method and device for transforming cnn layers to optimize cnn parameter quantization to be used for mobile devices or compact networks with high precision via hardware optimization
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
CN112686384A (en) * 2020-12-31 2021-04-20 南京大学 Bit-width-adaptive neural network quantization method and device
WO2021075735A1 (en) * 2019-10-15 2021-04-22 Lg Electronics Inc. Training a neural network using periodic sampling over model weights
CN112930543A (en) * 2018-10-10 2021-06-08 利普麦德股份有限公司 Neural network processing device, neural network processing method, and neural network processing program
EP3811619A4 (en) * 2018-10-19 2021-08-18 Samsung Electronics Co., Ltd. Ai encoding apparatus and operation method of the same, and ai decoding apparatus and operation method of the same
CN113468935A (en) * 2020-05-08 2021-10-01 上海齐感电子信息科技有限公司 Face recognition method
US11170473B2 (en) 2018-10-19 2021-11-09 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11170534B2 (en) 2018-10-19 2021-11-09 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
JP2021179966A (en) * 2019-06-12 2021-11-18 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co., Ltd. Quantization parameter determination method for neural network, and related product
CN113673664A (en) * 2020-05-14 2021-11-19 杭州海康威视数字技术股份有限公司 Data overflow detection method, device, equipment and storage medium
US11190782B2 (en) 2018-10-19 2021-11-30 Samsung Electronics Co., Ltd. Methods and apparatuses for performing encoding and decoding on image
CN113780523A (en) * 2021-08-27 2021-12-10 深圳云天励飞技术股份有限公司 Image processing method, image processing device, terminal equipment and storage medium
WO2022009433A1 (en) * 2020-07-10 2022-01-13 富士通株式会社 Information processing device, information processing method, and information processing program
US11288770B2 (en) 2018-10-19 2022-03-29 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device
US11616988B2 (en) 2018-10-19 2023-03-28 Samsung Electronics Co., Ltd. Method and device for evaluating subjective quality of video
WO2023115814A1 (en) * 2021-12-22 2023-06-29 苏州浪潮智能科技有限公司 Fpga hardware architecture, data processing method therefor and storage medium
US11720998B2 (en) 2019-11-08 2023-08-08 Samsung Electronics Co., Ltd. Artificial intelligence (AI) encoding apparatus and operating method thereof and AI decoding apparatus and operating method thereof
CN110378470B (en) * 2019-07-19 2023-08-18 Oppo广东移动通信有限公司 Optimization method and device for neural network model and computer storage medium
US11995532B2 (en) * 2018-12-05 2024-05-28 Arm Limited Systems and devices for configuring neural network circuitry
US12112257B2 (en) 2019-08-27 2024-10-08 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796244B (en) * 2018-08-01 2022-11-08 上海天数智芯半导体有限公司 Core computing unit processor for artificial intelligence device and accelerated processing method
KR20200026455A (en) * 2018-09-03 2020-03-11 삼성전자주식회사 Artificial neural network system and method of controlling fixed point in artificial neural network
US10331983B1 (en) * 2018-09-11 2019-06-25 Gyrfalcon Technology Inc. Artificial intelligence inference computing device
US20200117981A1 (en) * 2018-10-11 2020-04-16 International Business Machines Corporation Data representation for dynamic precision in neural network cores
US10387772B1 (en) * 2018-10-22 2019-08-20 Gyrfalcon Technology Inc. Ensemble learning based image classification systems
CN111126558B (en) * 2018-10-31 2024-04-02 嘉楠明芯(北京)科技有限公司 Convolutional neural network calculation acceleration method and device, equipment and medium
CN111191783B (en) * 2018-11-15 2024-04-05 嘉楠明芯(北京)科技有限公司 Self-adaptive quantization method and device, equipment and medium
FR3089329A1 (en) * 2018-11-29 2020-06-05 Stmicroelectronics (Rousset) Sas Method for analyzing a set of parameters of a neural network in order to obtain a technical improvement, for example a gain in memory.
CN109800859B (en) * 2018-12-25 2021-01-12 深圳云天励飞技术有限公司 Neural network batch normalization optimization method and device
CN109740733B (en) * 2018-12-27 2021-07-06 深圳云天励飞技术有限公司 Deep learning network model optimization method and device and related equipment
CN109697083B (en) * 2018-12-27 2021-07-06 深圳云天励飞技术有限公司 Fixed-point acceleration method and device for data, electronic equipment and storage medium
CN109670582B (en) * 2018-12-28 2021-05-07 四川那智科技有限公司 Design method of full-fixed-point neural network
CN109508784B (en) * 2018-12-28 2021-07-27 四川那智科技有限公司 Design method of neural network activation function
CN110222821B (en) * 2019-05-30 2022-03-25 浙江大学 Weight distribution-based convolutional neural network low bit width quantization method
CN112085187A (en) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112308216B (en) * 2019-07-26 2024-06-18 杭州海康威视数字技术股份有限公司 Data block processing method, device and storage medium
JP7294017B2 (en) * 2019-09-13 2023-06-20 富士通株式会社 Information processing device, information processing method and information processing program
CN110705696B (en) * 2019-10-11 2022-06-28 阿波罗智能技术(北京)有限公司 Quantization and fixed-point fusion method and device for neural network
CN111027691B (en) * 2019-12-25 2023-01-17 上海寒武纪信息科技有限公司 Device, equipment and board card for neural network operation and training
JP2021111081A (en) * 2020-01-09 2021-08-02 富士通株式会社 Information processing unit, operation program for neural network and operation method for neural network
CN113255877A (en) * 2020-02-12 2021-08-13 阿里巴巴集团控股有限公司 Quantitative processing method, device and equipment of neural network model and storage medium
US11610128B2 (en) * 2020-03-31 2023-03-21 Amazon Technologies, Inc. Neural network training under memory restraint
CN113554159A (en) * 2020-04-23 2021-10-26 意法半导体(鲁塞)公司 Method and apparatus for implementing artificial neural networks in integrated circuits
CN111831354B (en) * 2020-07-09 2023-05-16 北京灵汐科技有限公司 Data precision configuration method, device, chip array, equipment and medium
CN111831356B (en) * 2020-07-09 2023-04-07 北京灵汐科技有限公司 Weight precision configuration method, device, equipment and storage medium
CN111831355B (en) * 2020-07-09 2023-05-16 北京灵汐科技有限公司 Weight precision configuration method, device, equipment and storage medium
WO2022007879A1 (en) 2020-07-09 2022-01-13 北京灵汐科技有限公司 Weight precision configuration method and apparatus, computer device, and storage medium
CN113255901B (en) * 2021-07-06 2021-10-08 上海齐感电子信息科技有限公司 Real-time quantization method and real-time quantization system
CN114444688A (en) * 2022-01-14 2022-05-06 百果园技术(新加坡)有限公司 Neural network quantization method, apparatus, device, storage medium, and program product
WO2024140951A1 (en) * 2022-12-28 2024-07-04 Douyin Vision Co., Ltd. A neural network based image and video compression method with integer operations
CN117992578B (en) * 2024-04-02 2024-07-02 淘宝(中国)软件有限公司 Method for processing data based on large language model, large language model and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102015007943A1 (en) * 2014-07-22 2016-01-28 Intel Corporation Mechanisms for a weight shift in folding neural networks
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200787B (en) * 2011-04-18 2013-04-17 重庆大学 Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system
US20150269481A1 (en) * 2014-03-24 2015-09-24 Qualcomm Incorporated Differential encoding in neural networks
CN105488563A (en) * 2015-12-16 2016-04-13 重庆大学 Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device
CN105760933A (en) * 2016-02-18 2016-07-13 清华大学 Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102015007943A1 (en) * 2014-07-22 2016-01-28 Intel Corporation Mechanisms for a weight shift in folding neural networks
US20160328647A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Bit width selection for fixed point neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN XI ET AL: "FxpNet: Training a deep convolutional neural network in fixed-point representation", 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), IEEE, 14 May 2017 (2017-05-14), pages 2494 - 2501, XP033112353, DOI: 10.1109/IJCNN.2017.7966159 *
JIANTAO QIU ET AL: "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network", PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, FPGA '16, 1 January 2016 (2016-01-01), New York, New York, USA, pages 26 - 35, XP055423746, ISBN: 978-1-4503-3856-1, DOI: 10.1145/2847263.2847265 *
PHILIPP GYSEL ET AL: "HARDWARE-ORIENTED APPROXIMATION OF CONVOLUTIONAL NEURAL NETWORKS", 11 April 2016 (2016-04-11), XP055398866, Retrieved from the Internet <URL:https://arxiv.org/pdf/1604.03168v1.pdf> [retrieved on 20170816] *
SUYOG GUPTA ET AL: "Deep Learning with Limited Numerical Precision", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 February 2015 (2015-02-09), XP080677454 *

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11755880B2 (en) * 2018-03-09 2023-09-12 Canon Kabushiki Kaisha Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
US20190279072A1 (en) * 2018-03-09 2019-09-12 Canon Kabushiki Kaisha Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
CN109165736A (en) * 2018-08-08 2019-01-08 北京字节跳动网络技术有限公司 Information processing method and device applied to convolutional neural networks
CN109165736B (en) * 2018-08-08 2023-12-12 北京字节跳动网络技术有限公司 Information processing method and device applied to convolutional neural network
US11636319B2 (en) 2018-08-22 2023-04-25 Intel Corporation Iterative normalization for machine learning applications
EP3617954A1 (en) * 2018-08-22 2020-03-04 INTEL Corporation Iterative normalization for machine learning applications
CN109284761B (en) * 2018-09-04 2020-11-27 苏州科达科技股份有限公司 Image feature extraction method, device and equipment and readable storage medium
CN109284761A (en) * 2018-09-04 2019-01-29 苏州科达科技股份有限公司 A kind of image characteristic extracting method, device, equipment and readable storage medium storing program for executing
CN110929838B (en) * 2018-09-19 2023-09-26 杭州海康威视数字技术股份有限公司 Bit width localization method, device, terminal and storage medium in neural network
CN110929838A (en) * 2018-09-19 2020-03-27 杭州海康威视数字技术股份有限公司 Bit width localization method, device, terminal and storage medium in neural network
WO2020063715A1 (en) * 2018-09-26 2020-04-02 Huawei Technologies Co., Ltd. Method and system for training binary quantized weight and activation function for deep neural networks
CN110969217B (en) * 2018-09-28 2023-11-17 杭州海康威视数字技术股份有限公司 Method and device for image processing based on convolutional neural network
CN110969217A (en) * 2018-09-28 2020-04-07 杭州海康威视数字技术股份有限公司 Method and device for processing image based on convolutional neural network
CN112930543A (en) * 2018-10-10 2021-06-08 利普麦德股份有限公司 Neural network processing device, neural network processing method, and neural network processing program
US12026611B2 (en) 2018-10-17 2024-07-02 Samsung Electronics Co., Ltd. Method and apparatus for quantizing parameters of neural network
JP7117280B2 (en) 2018-10-17 2022-08-12 三星電子株式会社 Method and apparatus for quantizing parameters of neural network
CN111062475A (en) * 2018-10-17 2020-04-24 三星电子株式会社 Method and device for quantifying parameters of a neural network
JP2020064635A (en) * 2018-10-17 2020-04-23 三星電子株式会社Samsung Electronics Co.,Ltd. Method and device for quantizing parameter of neural network
EP3640858A1 (en) * 2018-10-17 2020-04-22 Samsung Electronics Co., Ltd. Method and apparatus for quantizing parameters of neural network
WO2020080827A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Ai encoding apparatus and operation method of the same, and ai decoding apparatus and operation method of the same
US11200702B2 (en) 2018-10-19 2021-12-14 Samsung Electronics Co., Ltd. AI encoding apparatus and operation method of the same, and AI decoding apparatus and operation method of the same
US11748847B2 (en) 2018-10-19 2023-09-05 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11720997B2 (en) 2018-10-19 2023-08-08 Samsung Electronics Co.. Ltd. Artificial intelligence (AI) encoding device and operating method thereof and AI decoding device and operating method thereof
US11688038B2 (en) 2018-10-19 2023-06-27 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
US11663747B2 (en) 2018-10-19 2023-05-30 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US11647210B2 (en) 2018-10-19 2023-05-09 Samsung Electronics Co., Ltd. Methods and apparatuses for performing encoding and decoding on image
EP3811619A4 (en) * 2018-10-19 2021-08-18 Samsung Electronics Co., Ltd. Ai encoding apparatus and operation method of the same, and ai decoding apparatus and operation method of the same
US11616988B2 (en) 2018-10-19 2023-03-28 Samsung Electronics Co., Ltd. Method and device for evaluating subjective quality of video
US11170473B2 (en) 2018-10-19 2021-11-09 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11170472B2 (en) 2018-10-19 2021-11-09 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11170534B2 (en) 2018-10-19 2021-11-09 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US20210358083A1 (en) 2018-10-19 2021-11-18 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US20200126185A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Artificial intelligence (ai) encoding device and operating method thereof and ai decoding device and operating method thereof
US11288770B2 (en) 2018-10-19 2022-03-29 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
US11190782B2 (en) 2018-10-19 2021-11-30 Samsung Electronics Co., Ltd. Methods and apparatuses for performing encoding and decoding on image
CN111144560A (en) * 2018-11-05 2020-05-12 杭州海康威视数字技术股份有限公司 Deep neural network operation method and device
CN111144560B (en) * 2018-11-05 2024-02-02 杭州海康威视数字技术股份有限公司 Deep neural network operation method and device
US11995532B2 (en) * 2018-12-05 2024-05-28 Arm Limited Systems and devices for configuring neural network circuitry
CN111353517B (en) * 2018-12-24 2023-09-26 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN111353517A (en) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
EP3686808A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Method and device for transforming cnn layers to optimize cnn parameter quantization to be used for mobile devices or compact networks with high precision via hardware optimization
CN109800877A (en) * 2019-02-20 2019-05-24 腾讯科技(深圳)有限公司 Parameter regulation means, device and the equipment of neural network
CN109800877B (en) * 2019-02-20 2022-12-30 腾讯科技(深圳)有限公司 Parameter adjustment method, device and equipment of neural network
CN111723901B (en) * 2019-03-19 2024-01-12 百度在线网络技术(北京)有限公司 Training method and device for neural network model
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
CN110110852A (en) * 2019-05-15 2019-08-09 电科瑞达(成都)科技有限公司 A kind of method that deep learning network is transplanted to FPAG platform
JP2021179966A (en) * 2019-06-12 2021-11-18 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co., Ltd. Quantization parameter determination method for neural network, and related product
US12093148B2 (en) 2019-06-12 2024-09-17 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
JP7167405B2 (en) 2019-06-12 2022-11-09 寒武紀(西安)集成電路有限公司 Determination of quantization parameters in neural networks and related products
CN110378470B (en) * 2019-07-19 2023-08-18 Oppo广东移动通信有限公司 Optimization method and device for neural network model and computer storage medium
US12112257B2 (en) 2019-08-27 2024-10-08 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
WO2021075735A1 (en) * 2019-10-15 2021-04-22 Lg Electronics Inc. Training a neural network using periodic sampling over model weights
US11922316B2 (en) 2019-10-15 2024-03-05 Lg Electronics Inc. Training a neural network using periodic sampling over model weights
US11720998B2 (en) 2019-11-08 2023-08-08 Samsung Electronics Co., Ltd. Artificial intelligence (AI) encoding apparatus and operating method thereof and AI decoding apparatus and operating method thereof
CN111144564A (en) * 2019-12-25 2020-05-12 上海寒武纪信息科技有限公司 Device for training neural network and integrated circuit board card thereof
CN111368978B (en) * 2020-03-02 2023-03-24 开放智能机器(上海)有限公司 Precision improving method for offline quantization tool
CN111368978A (en) * 2020-03-02 2020-07-03 开放智能机器(上海)有限公司 Precision improving method for offline quantization tool
CN113468935A (en) * 2020-05-08 2021-10-01 上海齐感电子信息科技有限公司 Face recognition method
CN113468935B (en) * 2020-05-08 2024-04-02 上海齐感电子信息科技有限公司 Face recognition method
CN113673664B (en) * 2020-05-14 2023-09-12 杭州海康威视数字技术股份有限公司 Data overflow detection method, device, equipment and storage medium
CN113673664A (en) * 2020-05-14 2021-11-19 杭州海康威视数字技术股份有限公司 Data overflow detection method, device, equipment and storage medium
WO2022009449A1 (en) * 2020-07-10 2022-01-13 富士通株式会社 Information processing device, information processing method, and information processing program
WO2022009433A1 (en) * 2020-07-10 2022-01-13 富士通株式会社 Information processing device, information processing method, and information processing program
CN112686384A (en) * 2020-12-31 2021-04-20 南京大学 Bit-width-adaptive neural network quantization method and device
CN113780523A (en) * 2021-08-27 2021-12-10 深圳云天励飞技术股份有限公司 Image processing method, image processing device, terminal equipment and storage medium
CN113780523B (en) * 2021-08-27 2024-03-29 深圳云天励飞技术股份有限公司 Image processing method, device, terminal equipment and storage medium
WO2023115814A1 (en) * 2021-12-22 2023-06-29 苏州浪潮智能科技有限公司 Fpga hardware architecture, data processing method therefor and storage medium
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device

Also Published As

Publication number Publication date
CN108345939B (en) 2022-05-24
CN108345939A (en) 2018-07-31

Similar Documents

Publication Publication Date Title
WO2018140294A1 (en) Neural network based on fixed-point operations
US11568258B2 (en) Operation method
Zhou et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients
CN112074806B (en) System, method and computer storage medium for block floating point computing
US10929744B2 (en) Fixed-point training method for deep neural networks based on dynamic fixed-point conversion scheme
US10678508B2 (en) Accelerated quantized multiply-and-add operations
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
CN112288086B (en) Neural network training method and device and computer equipment
CN113424202A (en) Adjusting activation compression for neural network training
EP3816873A1 (en) Neural network circuit device, neural network processing method, and neural network execution program
CN107944545B (en) Computing method and computing device applied to neural network
EP3915056A1 (en) Neural network activation compression with non-uniform mantissas
CN111026544B (en) Node classification method and device for graph network model and terminal equipment
CN112673383A (en) Data representation of dynamic precision in neural network cores
CN113826122A (en) Training of artificial neural networks
US11704556B2 (en) Optimization methods for quantization of neural network models
CN113994347A (en) System and method for asymmetric scale factor support for negative and positive values
Choi et al. Retrain-less weight quantization for multiplier-less convolutional neural networks
CN114450891A (en) Design and training of binary neurons and binary neural networks using error correction codes
US20240104342A1 (en) Methods, systems, and media for low-bit neural networks using bit shift operations
CN111126557A (en) Neural network quantification method, neural network quantification application device and computing equipment
Scanlan Low power & mobile hardware accelerators for deep convolutional neural networks
US20220405576A1 (en) Multi-layer neural network system and method
WO2020177863A1 (en) Training of algorithms
CN117063183A (en) Efficient compression of activation functions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18709181

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18709181

Country of ref document: EP

Kind code of ref document: A1