Nothing Special   »   [go: up one dir, main page]

WO2020133492A1 - Procédé et appareil de compression de réseau neuronal - Google Patents

Procédé et appareil de compression de réseau neuronal Download PDF

Info

Publication number
WO2020133492A1
WO2020133492A1 PCT/CN2018/125812 CN2018125812W WO2020133492A1 WO 2020133492 A1 WO2020133492 A1 WO 2020133492A1 CN 2018125812 W CN2018125812 W CN 2018125812W WO 2020133492 A1 WO2020133492 A1 WO 2020133492A1
Authority
WO
WIPO (PCT)
Prior art keywords
zero
weights
training
weight
group
Prior art date
Application number
PCT/CN2018/125812
Other languages
English (en)
Chinese (zh)
Inventor
朱佳峰
刘刚毅
卢惠莉
高伟
芮祥麟
杨鋆源
夏军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/125812 priority Critical patent/WO2020133492A1/fr
Priority to CN201880099983.5A priority patent/CN113168554B/zh
Publication of WO2020133492A1 publication Critical patent/WO2020133492A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of neural networks, and in particular to a neural network compression method and device.
  • deep learning technology is in full swing in the industry, and various industries are applying deep learning technology in their respective fields.
  • the deep learning model that is, the neural network model
  • the neural network is usually over-parameterized, and the deep learning model has obvious redundancy, which leads to computation and storage. waste.
  • the current industry has proposed a variety of compression methods, such as a variety of model sparse methods, these methods through pruning, quantization, etc., to reset the weight of the model weight matrix with weak expression Zero to achieve the purpose of simplifying model calculation and storage.
  • the value of each weight in the deep learning model is automatically learned based on the training set. Random sparseness is performed during the training process, and the weights cannot be sparsely targeted. Processing, so that subsequent processing equipment can only rely on the deep learning model obtained by random sparseness for data processing, can not be well adapted to the processing equipment's ability, and can not achieve a better processing effect.
  • the embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.
  • the present application provides a neural network compression method, which determines the sparse unit length according to the processing capability information of the processing device; then, when performing the current training on the neural network model, according to the jth set of weights referred to in the previous training , Adjust the j-th group weight obtained after the last training to obtain the j-th group weight referred to in the current training; wherein the length of the sparse unit is the data length of one operation when the processing device performs matrix operation, the first The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
  • the sparse unit length when performing neural network compression, can be determined based on the capability information of the processing device.
  • the weights after grouping based on the sparse unit length can be processed according to the capabilities of the processing device. The ability to adapt the neural network model to different processing equipment so that subsequent processing equipment can achieve better processing results.
  • the length of the sparse unit is determined according to the processing capability information of the processing device, and the specific method may be: determining the length of the register in the processing device or the maximum data length of the instruction set in the processing device, Then, the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  • the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
  • the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length.
  • the computing unit may be, but not limited to, GPU, NPU, etc.
  • the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
  • the ownership weight of the initial neural network model is tailored before the first training of the neural network.
  • the neural network is first trimmed, which can save some processing processes in the subsequent training process and improve the calculation speed.
  • the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the last training, which may specifically include the following five situations:
  • the j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training.
  • the proportion of the total number of j groups of weights is not less than the set proportion threshold, keep the jth group of weights obtained after the previous training unchanged.
  • the weights after the last training can be adjusted according to different actual conditions, so that the weight zero values of the neural network model obtained later are distributed more regularly, so that as many zero values as possible are continuously distributed in a set of weights In this way, when the neural network model is subsequently used for data processing, the time for accessing data is reduced, and the calculation speed is improved.
  • the zero-setting weight threshold may be determined based on the initial weight threshold, for example, the zero-setting weight threshold may set a multiple of the initial weight threshold, the setting The multiple is greater than 1. In this way, the value range of the current weight can be more closely matched in the subsequent judgment process.
  • determining whether the j-th group of weights referred to in the previous training are all zero may be: determine whether the zero-setting flag corresponding to the j-th group of weights in the zero-setting marker data structure is zero; when When the zero-setting mark is zero, it is determined that the jth group weights of the last training reference are all zero; when the zero-setting mark is non-zero value, it is determined that the jth group weights of the previous training reference are incomplete Is zero.
  • the current zero-setting flag data is also set
  • the zero-setting mark corresponding to the j-th group weight in the structure is updated to zero; or, after keeping the j-th group weight obtained after the last training unchanged, the j-th group weight in the current zero-marking data structure is also corresponded to The zero-setting flag of is updated to a non-zero value.
  • the zero-setting flags in the zero-setting flag data structure can be updated in real time, so that when the weight adjustment is performed, it can be more accurately judged whether the jth group weights referred to in the previous training are all zero.
  • the present application provides a data processing method to obtain the weights of the target neural network model, and perform the following processing based on the weights of the target neural network model: at the pth processing, determine whether the qth group of weights are all Is zero, if yes, generate and save the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the qth group weight, the matrix data to be processed and the matrix
  • the operation type generates the second operation result and saves it;
  • the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the neural network model;
  • the sparse unit length is based on Determined by the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation;
  • the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer in, where f is
  • the final neural network model obtained by training the neural network model after the weighting of the neural network is grouped, so that according to the characteristics of the matrix operation, the subsequent application of the When the final neural network model performs data processing, it can greatly reduce the amount of data access and calculation, which can increase the speed of operation.
  • a specific method for judging whether the q-th group weights are all zero may be: obtaining a zero-setting label data structure corresponding to the weight of the target neural network model; judging the zero-setting label data structure Whether the zero-setting mark corresponding to the q-th group weight is zero; specifically, when the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero, the q-th group's The weights are all zero; when the zero-setting flags corresponding to the weights of the q-th group in the zero-marking data structure are not zero, it is determined that the weights of the q-th group are not all zero.
  • the data processing device when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed:
  • the matrix operation type is matrix multiplication
  • the data processing device directly obtains that the first operation result is zero;
  • the matrix operation type is matrix addition
  • the data processing device determines the matrix to be processed
  • the data is the result of the first operation. This can reduce the amount of data access and calculations, which can increase the speed of operation.
  • the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the neural network compression device may include a determination unit, a weight adjustment unit, and a training unit, and these units may perform the corresponding functions in the method examples of the first aspect described above. For details, see the method examples of the first aspect The detailed description in is not repeated here.
  • the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the method mentioned in the first aspect above.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.
  • the present application further provides a data processing device having the function of implementing the method of the second aspect.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the data processing device may include an acquisition unit and a processing unit, and these units may perform the corresponding functions in the method examples of the second aspect described above. For details, see the detailed description in the method examples of the second aspect. I will not repeat them here.
  • the structure of the data processing apparatus may include a processor and a memory, and the processor is configured to perform the method mentioned in the second aspect above.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the data processing device.
  • the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned in one aspect or the second aspect.
  • the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect or the second aspect.
  • the present application further provides a chip coupled to a memory for reading and executing program instructions stored in the memory to implement any of the methods mentioned in the first aspect or the second aspect .
  • FIG. 1 is a schematic diagram of a neural network provided by an embodiment of this application.
  • FIG. 2 is a structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a neural network compression method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of a data structure and a weight matrix of a zero-setting mark provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a weight adjustment provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a data processing method provided by an embodiment of this application.
  • FIG. 7 is an example diagram of a data processing process provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present application.
  • FIG. 10 is a structural diagram of a neural network compression device provided by an embodiment of the present application.
  • FIG. 11 is a structural diagram of a data processing apparatus according to an embodiment of the present application.
  • the embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.
  • the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.
  • a neural network consists of a large number of nodes (or neurons) connected to each other.
  • the neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1.
  • the input layer is the input data of the neural network
  • the output layer is the output data of the neural network
  • the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data.
  • the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.
  • the embodiment of the present application may be referred to as a neural network compression device.
  • the neural network compression device may be, but not limited to, a personal computer (personal computer, PC) and other terminal devices, a server, a cloud service platform, etc.
  • a neural network model The deployed platform may be referred to as a data processing device.
  • the data processing device may be, but not limited to, a mobile phone, a tablet computer, a PC, and other terminal devices, but may also be but not limited to a server, etc.
  • FIG. 2 shows a possible terminal device applicable to the neural network method or the data processing method provided by the embodiments of the present application.
  • the terminal device includes: a processor 210, a memory 220, a communication module 230, and an input. Unit 240, display unit 250, power supply 260 and other components.
  • the terminal device provided in the embodiments of the present application may include more or fewer components than shown, or a combination of Components, or different component arrangements.
  • the communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception of terminal devices.
  • the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.
  • the memory 220 can be used to store program instructions and data.
  • the processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the terminal device.
  • program instructions there are program instructions that enable the processor 210 to execute the neural network compression method or the data processing method provided by the following embodiments of the present application.
  • the memory 220 may mainly include a program storage area and a data storage area.
  • the storage program area can store the operating system, various application programs, and program instructions;
  • the storage data area can store various data such as neural networks.
  • the memory 210 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 240 may be used to receive information such as data or operation instructions input by the user.
  • the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
  • the display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface.
  • the display unit 250 may include a display panel 251.
  • the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
  • the processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components.
  • the processor 210 may execute the program instructions stored in the memory 220 and call the data stored in the memory 220 to complete various functions of the computer device and implement the neural network compression provided by the embodiments of the present application Method or data processing method.
  • the processor 210 may include one or more processing units. Specifically, the processor 210 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, application programs, etc., and the modem processor mainly handles wireless communication . It can be understood that the foregoing modem processor may not be integrated into the processor 210.
  • the processing unit may compress the neural network or process the data.
  • the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU.
  • the processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing.
  • the processor 210 may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof.
  • the PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
  • the terminal device also includes a power supply 260 (such as a battery) for powering various components.
  • a power supply 260 (such as a battery) for powering various components.
  • the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the terminal device through the power management system.
  • the terminal device may further include components such as a camera, a sensor, and an audio collector, which are not repeated here.
  • the foregoing terminal device is only an example of a device to which the neural network compression method or data processing method provided in the embodiments of the present application is applicable. It should be understood that the neural network compression method or data processing method provided in the embodiments of the present application may also be applied to other devices than the above terminal devices, which is not limited in this application.
  • a neural network compression method provided by an embodiment of the present invention can be applied to the terminal device shown in FIG. 2 or other devices (such as a server, etc.).
  • the neural network compression device whose execution subject is a neural network compression device is taken as an example to illustrate the neural network compression method provided by the present application.
  • the specific flow of the method may include:
  • Step 301 The neural network compression device determines the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation.
  • the processing device is a device for processing the data to be processed after the neural network compression device finally obtains the neural network model. It should be noted that the processing device may be applied to the data processing device involved in this application.
  • the training of the neural network model is for one processing device, so the processing capability information of the processing device can be pre-configured in the neural network compression device, so that the neural network compression device obtains for the processing device
  • the subsequent process is directly performed according to the capability information of the processing device.
  • the capability information of the processing device may be indicated by the capability of the processing device to process data.
  • the capability information of the processing device may be understood as capability information of a processor and a computing chip included in the processing device, where the processor or the computing chip may be, but not limited to, central processing Processor (central processing unit, CPU), graphics processor (Graphics Processing Unit, GPU), network processor (network processor unit, NPU), etc.
  • the processing device may also be a processor or a computing chip directly.
  • the capability information of the processing device may be embodied as a data length of one operation when the processing device performs matrix operation. Based on:
  • the neural network compression device determines the sparse unit length according to the processing capability information of the processing device.
  • the specific method may be: the neural network compression device determines the length of the register in the processing device or The maximum data length of the instruction set in the processing device at a time, and the length of the register or the maximum data length of the instruction set at a time is used as the sparse unit length.
  • the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length .
  • the calculation unit may be a GPU, NPU, or the like.
  • the neural network compression device may further determine one or more combinations of the bit widths of registers, caches, instruction sets, and calculation units in the processing device The maximum data length that can be supported, and the maximum data length that can be supported is used as the sparse unit length.
  • the neural network model can be specifically trained for different hardware devices, which can be more adapted to the processing capabilities of the hardware devices and achieve better results.
  • Step 302 When performing the current training on the neural network model, the neural network compression device adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training to obtain the current training reference Group j weights; wherein, the number of weights included in group j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the neural network model The total number of weights obtained after grouping according to the sparse unit length is the total number of groups.
  • each time the neural network compression device performs training it obtains a continuous set of weights according to the sparse unit length to perform the training process. It can be understood that the neural network compression device groups the weight according to the sparse unit length.
  • the weight of the neural network model can be obtained first, and when the neural network compression device can directly obtain the specific data of the weight, the neural network model can also be obtained Model file, and parse the model file to obtain weighted data.
  • the neural network compression device may perform weighting on the initial neural network model according to the initial weight threshold of the initial neural network model. Tailoring.
  • the specific method for the neural network compression device to trim the weight of the initial neural network model may be: the neural network compression device separately obtains the weight of each layer of the initial neural network model Then, the weights of each layer are trimmed according to the initial weight threshold of each layer until the weights of all layers are trimmed.
  • the above process can be called a sparse process.
  • the above process can use a variety of commonly used matrix sparse methods, such as the pruning method mentioned in the paper "Learning both Weights and Connections for Efficient Neural Networks", and It may be the quantization method mentioned in the paper "Ternary Weights" or other methods, which is not specifically limited in this application.
  • the specific process may be that the neural network compression device The weights in each layer that are less than the initial weight threshold of each layer are set to zero, and the weights in each layer that are not less than the initial weight threshold of each layer are kept unchanged.
  • the neural network compression device before obtaining the weight of each layer of the initial neural network model, the neural network compression device needs to train the neural network to obtain the weight of the neural network, and then obtain the initial Neural network model.
  • the neural network is trained to obtain the weight in the neural network, which may be specifically: through data input and neural network model construction, the structure of the neural network and the weight in the neural network are obtained.
  • the neural network may be trained through commonly used deep learning frameworks, such as TensorFlow, Caffe, MXNet, PyTorch, and so on.
  • the neural network compression device adjusts the jth group weight obtained after the last training according to the jth group weight referenced in the last training, which may specifically include the following 5 cases:
  • the set specific gravity threshold may be 30%, etc., and may also be other values, which is not limited in this application.
  • the j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training.
  • the neural network compression device Keep the jth group weight obtained after the previous training unchanged.
  • the distribution of zero values in the weight matrix of the final neural network model can be made more uniform, for example, continuous zero values can be distributed in a set of weights as much as possible, so that the subsequent application of the neural network model for data
  • the zero-value regular distribution is used during processing to greatly reduce the number of memory accesses and the amount of calculation, which in turn can increase the speed of calculation.
  • the jth group weight referenced in the last training can be understood as the jth group weight that needs training last time; the jth group weight obtained after adjusting the jth group weight obtained after the last training is the current time
  • the weight that needs to be trained that is, the weight that is referenced in the current training.
  • the jth group of weights referenced in the first training may be the jth group of weights of the initial neural network model.
  • the zero-setting weight threshold may be determined based on the initial weight threshold.
  • the zero-setting weight threshold may set a multiple of the initial weight threshold, The set multiple is greater than 1. For example, when the initial weight threshold is 1, the zero-setting threshold may be 1.05.
  • the neural network compression device maintains a zero-setting mark data structure, and a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure (where each A set of weights can be called a weight matrix).
  • a weight matrix a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure
  • each A set of weights can be called a weight matrix.
  • the zero mark data structure and weight matrix can be represented as shown in the schematic diagram in FIG. 4.
  • the weight of each consecutive sparse unit length in the weight matrix corresponds to 1 bit in a zero-mark data structure.
  • the sparse unit length is 4, every 4 The continuous weight corresponds to a zero-setting mark.
  • the specific method may be: the nerve The network compression device determines whether the zero-setting mark corresponding to the j-th set of weights in the zero-setting mark data structure is zero; when the zero-setting mark is zero, it is determined that the j-th set of weights referred to in the previous training are all zero ; When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero. For example, taking FIG. 4 as an example, in the data structure of the zero mark in FIG.
  • the first zero mark is 0, which means that the set of weights corresponding to the zero mark is all 0.
  • the weight matrix in FIG. The first 4 weights in a row (that is, the first group of weights, or the first weight matrix) can be seen that the corresponding group of weights are all 0.
  • the neural network compression device After the neural network compression device resets all the jth group weights obtained after the previous training to zero, or after all non-zero weights are set to zero, Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to zero. Similarly, in an optional implementation manner, after maintaining the jth group weight obtained after the last training unchanged, the neural network compression device changes the jth group weight in the current zero-marking data structure The corresponding zero mark is updated to a non-zero value (that is, 1).
  • the zero-setting mark in the zero-setting mark data structure can be updated in real time, so that the weights can be adjusted more accurately during the training process, and the subsequent processing device can accurately base on the data processing based on the neural network model Weights are used for data processing.
  • the above five situations may actually be a cyclic process.
  • the neural network compression device first determines whether the jth group weight referenced by the previous training is zero, and then performs subsequent processes according to the judgment results. According to the above five situations, Thereby, new weights of all groups of the neural network model are obtained, so that the neural network compression device subsequently trains the new weights.
  • a schematic diagram of a specific weight adjustment process may be shown in FIG. 5.
  • the weight of the neural network model when the weight of the neural network model is grouped according to the sparse unit length, there may be multiple cases: in one case, the neural network model The weights are grouped together evenly. During the grouping process, the number of remaining weights in the last group may be less than the length of the sparse unit. At this time, even if the number of weights in the last group is less than the length of the sparse unit, the weight of the group is processed.
  • the processing method of other groups of weights (the number is equal to the length of the sparse unit); another case is that the weight matrix composed of the weights of the neural network model is divided into rows (or columns) for each The weight of one row (or column) is grouped, so that when each row (or column) is grouped according to the sparse unit length, the number of weights in the last group in each row (or column) may also be less than the length of the sparse unit For the same reason, the processing method of the weight of the last group in each row (or column) is the same as the processing method of the weights of other groups (the number is equal to the length of the sparse unit).
  • Step 303 The neural network compression device performs the current training on the neural network model according to the obtained sets of weights referenced by the current training.
  • step 302 all group weights of the neural network model can be obtained, so that step 303 can be performed.
  • the method for performing step 303 by the neural network may refer to a commonly used neural network training method, which is not specifically described in this application.
  • the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length are processed during the training process, According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
  • the final neural network model obtained through the embodiment shown in FIG. 3 may be applied to a data processing device, so that the data processing device performs data processing based on the finally obtained neural network model.
  • an embodiment of the present application also provides a data processing method, which is implemented based on the final neural network model obtained in the embodiment shown in FIG. 3.
  • the data processing method provided by the present application is explained by taking an execution subject as a data processing device as an example.
  • the specific flow of the method may include the following steps:
  • Step 601 The data processing device obtains the weight of the target neural network model, the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the sparse unit length It is determined based on the processing capability information of the processing device, and the sparse unit length is the data length of one operation when performing matrix operation.
  • the processing device is the data processing device here, and for a specific method for determining the sparse unit length based on the processing capability information of the processing device, reference may also be made to the related reference in the embodiment shown in FIG. 3 The method will not be repeated here.
  • Step 602 Perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the to-be-processed
  • the matrix data generates the first operation result and saves it; otherwise, generates and saves the second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type.
  • the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  • the data processing device determines whether the q-th group weights are all zero, first obtain a zero-setting label data structure corresponding to the weight of the target neural network model, and then determine the zero-setting Mark whether the zero-setting mark corresponding to the qth group of weights in the data structure is zero. Specifically, when the zero-setting mark corresponding to the weight of the q-th group in the zero-marking data structure is zero, the data processing device determines that the weights of the q-th group are all zero; When the zero-setting flag corresponding to the weight of the q-th group in the zero-mark data structure is not zero, the data processing device determines that the weights of the q-th group are not all zero. For example, as shown in FIG. 4, when the zero-setting mark corresponding to the q-th group weight is acquired as the first zero-setting mark, since the zero-setting mark is 0, it is determined that the q-th group weights are all zero.
  • the target neural network model is adapted to the data processing device, information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in.
  • information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in.
  • the data processing device when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: when the When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines that the matrix data to be processed is all The first operation result is described.
  • the data processing device when the q-th group weights are not all zero, the data processing device generates a second operation result according to the q-th group weights, the matrix data to be processed, and the matrix operation type,
  • a specific method is: the data processing device loads the qth group weights and the matrix data to be processed into a register, and then loads the qth group weights and the matrix to be processed according to the matrix operation type The data is subjected to a corresponding matrix operation to generate the second operation result.
  • the final processing result can be generated.
  • the above processing process is a cyclic process, and the above processing is performed for each group of weights until the weights of all groups are traversed.
  • a specific data processing process may be shown in the schematic diagram in FIG. 7.
  • the weights of the neural network model are grouped and the final neural network model is trained after the neural network is grouped
  • the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
  • the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method provided in the embodiment shown in FIG. 3.
  • the neural network compression device 800 includes a determination unit 801, a weight adjustment unit 802, and a training unit 803, where:
  • the determining unit 801 is used to determine the sparse unit length according to the processing capability information of the processing device, and the sparse unit length is the data length of one operation when the processing device performs matrix operation;
  • the weight adjustment unit 802 is used to When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training;
  • the number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit
  • the training unit 803 is configured to perform the current training of the neural network model according to the weights of the groups referenced by the current adjustment training unit obtained by the weight adjustment unit.
  • the determining unit 801 determines the length of the register in the processing device or the instruction set in the processing device for processing at a time Maximum data length; the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  • the neural network compression device may further include a weight trimming unit, the weight trimming unit is used to first train the neural network according to the initial neural network model before the training unit The initial weight threshold of, trims the initial weight of the initial neural network model.
  • the weight adjustment unit 802 when the weight adjustment unit 802 adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training, the weight adjustment unit 802 may be specifically classified into the following types: happening:
  • the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training
  • the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold
  • the The weights of the non-zero values in the group j weights are all set to zero; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training
  • the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
  • the weight adjustment unit 802 is specifically configured to determine whether the j-th group of weights in the zero-mark data structure corresponds to the j-th group of weights referenced in the previous training is all zero Whether the zero-setting flag is zero; when the zero-setting flag is zero, it is determined that the jth group of weights referred to in the previous training are all zero; when the zero-setting flag is a non-zero value, it is determined that the upper The weights of the jth group referred to in one training are not all zero.
  • the weight adjustment unit 802 is further used to set the weights of the j-th group obtained after the last training to zero, or to set the weights of the non-zero values to all After zero, the zero-setting flag corresponding to the j-th group of weights in the current zero-setting flag data structure is updated to zero; or, the weight adjustment unit 802 is further used to maintain the j-th group of weights obtained after the previous training After unchanged, the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure is updated to a non-zero value.
  • the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length can be processed during the training process.
  • the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
  • the embodiments of the present application further provide a data processing apparatus, which is used to implement the data processing method provided in the embodiment shown in FIG. 6.
  • the data processing apparatus 900 includes an acquiring unit 901 and a processing unit 902, where:
  • the obtaining unit 901 is used to obtain the weight of the target neural network model.
  • the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping;
  • the processing unit 902 It is used to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the matrix to be processed Generate and save the first operation result of the data, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type; wherein, the length of the sparse unit is based on processing Determined by the processing capability information of the device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any
  • the processing unit 902 is specifically configured to: when determining whether the q-th group weights are all zero: obtain a zero-labeled data structure corresponding to the weight of the target neural network model; determine the Whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
  • the final neural network model obtained by training the neural network model after weighting the neural network model is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
  • the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3.
  • the neural network compression device 1000 includes: a processor 1001 and a memory 1002, where:
  • the processor 1001 may be a CPU, GPU, or a combination of CPU and GPU.
  • the processor 1001 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on.
  • the processor 1001 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1001 is not limited to the above enumerated cases, and the processor 1001 may be any processing device capable of implementing the neural network compression method shown in FIG. 3 described above.
  • the processor 1001 and the memory 1002 are connected to each other.
  • the processor 1001 and the memory 1002 are connected to each other through a bus 1003;
  • the bus 1003 may be a peripheral component interconnection standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard structure (Extended Industry Standard Architecture) , EISA) bus and so on.
  • PCI peripheral component interconnection standard
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • processor 1001 When the processor 1001 is used to implement the neural network compression method provided by the embodiment of the present application, it performs the following operations:
  • the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training;
  • the number of weights included in the set of j weights is the length of the sparse unit;
  • the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
  • the processor 1001 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 301, step 302, and step 303 in the embodiment shown in FIG. 3 above. Repeat again.
  • the memory 1002 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operation.
  • the memory 1002 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the processor 1001 executes the program stored in the memory 1002 to realize the above-mentioned functions, thereby implementing the neural network compression method shown in FIG. 3.
  • the neural network compression device shown in FIG. 10 when the neural network compression device shown in FIG. 10 can be applied to a terminal device, the neural network compression device may be embodied as the terminal device shown in FIG. 2.
  • the processor 1001 may be the same as the processor 210 shown in FIG. 2
  • the memory 1002 may be the same as the memory 220 shown in FIG. 2.
  • an embodiment of the present application further provides a data processing apparatus, which is used to implement the data processing method shown in FIG. 4.
  • the data processing device 1100 includes a processor 1101 and a memory 1102, where:
  • the processor 1101 may be a CPU, GPU, or a combination of CPU and GPU.
  • the processor 1101 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on.
  • the processor 1101 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1101 is not limited to the above-mentioned cases, and the processor 1101 may be any processing device capable of implementing neural network inference operation.
  • the processor 1101 and the memory 1102 are connected to each other.
  • the processor 1101 and the memory 1102 are connected to each other through a bus 1103;
  • the bus 1103 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • processor 1101 When the processor 1101 is used to implement the data processing method provided by the embodiment of the present application, it may perform the following operations:
  • the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;
  • the following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;
  • the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  • processor 1101 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 601 and step 602 in the embodiment shown in FIG. 6 above, and details are not described herein again.
  • the memory 1102 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operation.
  • the memory 1102 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the processor 1101 executes the program stored in the memory 1102 to realize the above functions, thereby implementing the data processing method shown in FIG. 6.
  • the data processing apparatus shown in FIG. 11 can be applied to a terminal device, the data processing apparatus may be embodied as the terminal device shown in FIG. 2.
  • the processor 1101 may be the same as the processor 210 shown in FIG. 2
  • the memory 1102 may be the same as the memory 220 shown in FIG. 2.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Un procédé et un appareil de compression de réseau neuronal, utilisés pour résoudre le problème de l'état de la technique selon lequel il n'est pas possible de s'adapter efficacement à la capacité d'un dispositif de traitement et d'obtenir un meilleur effet de traitement. Le procédé comprend les étapes consistant à : déterminer une longueur d'unité éparse selon des informations de capacité de traitement d'un dispositif de traitement ; lors de la réalisation d'un cycle d'entraînement actuel sur un modèle de réseau neuronal, selon un j-ième ensemble de poids référencé dans un cycle d'entraînement précédent, ajuster le j-ième ensemble de poids obtenu après le cycle d'entraînement précédent, et obtenir un j-ième ensemble de poids référencé dans le cycle d'entrainement actuel ; réaliser le cycle d'entraînement actuel sur le modèle de réseau neuronal selon divers ensembles de poids obtenus référencés dans le cycle d'entraînement actuel. La longueur d'unité éparse est la longueur de données d'une opération lorsque le dispositif de traitement effectue des opérations matricielles, le nombre de poids inclus dans le j-ième ensemble de poids est la longueur unitaire éparse, j est un nombre entier positif quelconque de 1 à m, et m est le nombre total d'ensembles de poids obtenus après regroupement de tous les poids du modèle de réseau neuronal selon la longueur d'unité éparse.
PCT/CN2018/125812 2018-12-29 2018-12-29 Procédé et appareil de compression de réseau neuronal WO2020133492A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/125812 WO2020133492A1 (fr) 2018-12-29 2018-12-29 Procédé et appareil de compression de réseau neuronal
CN201880099983.5A CN113168554B (zh) 2018-12-29 2018-12-29 一种神经网络压缩方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125812 WO2020133492A1 (fr) 2018-12-29 2018-12-29 Procédé et appareil de compression de réseau neuronal

Publications (1)

Publication Number Publication Date
WO2020133492A1 true WO2020133492A1 (fr) 2020-07-02

Family

ID=71127997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125812 WO2020133492A1 (fr) 2018-12-29 2018-12-29 Procédé et appareil de compression de réseau neuronal

Country Status (2)

Country Link
CN (1) CN113168554B (fr)
WO (1) WO2020133492A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112014A (zh) * 2021-03-04 2021-07-13 联想(北京)有限公司 一种数据处理方法、设备及存储介质
CN114580630A (zh) * 2022-03-01 2022-06-03 厦门大学 用于ai芯片设计的神经网络模型训练方法及图形分类方法
EP4191478A1 (fr) * 2021-12-02 2023-06-07 Beijing Baidu Netcom Science Technology Co., Ltd. Procédé et appareil de compression de modèle de réseau neuronal

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383666B (zh) * 2023-05-23 2024-04-19 重庆大学 一种电力数据预测方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (zh) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 一种基于fpga实现稀疏化gru神经网络的硬件加速器及方法
CN107239824A (zh) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 用于实现稀疏卷积神经网络加速器的装置和方法
CN107239825A (zh) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 考虑负载均衡的深度神经网络压缩方法
CN107688850A (zh) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 一种深度神经网络压缩方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks
WO2018107414A1 (fr) * 2016-12-15 2018-06-21 上海寒武纪信息科技有限公司 Appareil, équipement et procédé de compression/décompression d'un modèle de réseau neuronal
CN107909147A (zh) * 2017-11-16 2018-04-13 深圳市华尊科技股份有限公司 一种数据处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (zh) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 一种基于fpga实现稀疏化gru神经网络的硬件加速器及方法
CN107239825A (zh) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 考虑负载均衡的深度神经网络压缩方法
CN107239824A (zh) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 用于实现稀疏卷积神经网络加速器的装置和方法
CN107688850A (zh) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 一种深度神经网络压缩方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112014A (zh) * 2021-03-04 2021-07-13 联想(北京)有限公司 一种数据处理方法、设备及存储介质
EP4191478A1 (fr) * 2021-12-02 2023-06-07 Beijing Baidu Netcom Science Technology Co., Ltd. Procédé et appareil de compression de modèle de réseau neuronal
US11861498B2 (en) 2021-12-02 2024-01-02 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for compressing neural network model
CN114580630A (zh) * 2022-03-01 2022-06-03 厦门大学 用于ai芯片设计的神经网络模型训练方法及图形分类方法
CN114580630B (zh) * 2022-03-01 2024-05-31 厦门大学 用于ai芯片设计的神经网络模型训练方法及图形分类方法

Also Published As

Publication number Publication date
CN113168554B (zh) 2023-11-28
CN113168554A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2022077587A1 (fr) Procédé et appareil de prédiction de données et dispositif terminal
US20230252327A1 (en) Neural architecture search for convolutional neural networks
WO2020133492A1 (fr) Procédé et appareil de compression de réseau neuronal
US20190087713A1 (en) Compression of sparse deep convolutional network weights
CN109886422A (zh) 模型配置方法、装置、电子设备及可读取存储介质
US9600762B2 (en) Defining dynamics of multiple neurons
CN110399487B (zh) 一种文本分类方法、装置、电子设备及存储介质
US20210312295A1 (en) Information processing method, information processing device, and information processing program
US20220335293A1 (en) Method of optimizing neural network model that is pre-trained, method of providing a graphical user interface related to optimizing neural network model, and neural network model processing system performing the same
KR20220009682A (ko) 분산 기계 학습 방법 및 시스템
EP3685266A1 (fr) Commande d'état de puissance d'un dispositif mobile
CN112269875B (zh) 文本分类方法、装置、电子设备及存储介质
CN112700006A (zh) 网络架构搜索方法、装置、电子设备及介质
US20150278683A1 (en) Plastic synapse management
CN113742069A (zh) 基于人工智能的容量预测方法、装置及存储介质
CN114492998A (zh) 能源类大数据处理方法、装置、计算机设备和存储介质
JP2023064695A (ja) ディープ・ニューラル・ネットワークにおけるニアメモリ疎行列計算
WO2020133364A1 (fr) Procédé et appareil de compression de réseau neuronal
CN112766462A (zh) 数据处理方法、装置及计算机可读存储介质
US20230059976A1 (en) Deep neural network (dnn) accelerator facilitating quantized inference
EP4283522A1 (fr) Circuit de réseau neuronal impulsionnel et procédé de calcul basé sur un réseau neuronal impulsionnel
US20240020510A1 (en) System and method for execution of inference models across multiple data processing systems
US20230351165A1 (en) Method for operating neural network
US20230325665A1 (en) Sparsity-based reduction of gate switching in deep neural network accelerators
CN117909714A (zh) 充电功率预测模型训练方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18945108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18945108

Country of ref document: EP

Kind code of ref document: A1