CN108510083A

CN108510083A - A kind of neural network model compression method and device

Info

Publication number: CN108510083A
Application number: CN201810274146.3A
Authority: CN
Inventors: 孙源良; 王亚松; 刘萌; 樊雨茂
Original assignee: Guoxin Youe Data Co Ltd
Current assignee: Guoxin Youe Data Co Ltd
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2018-09-07
Anticipated expiration: 2038-03-29
Also published as: CN108510083B

Abstract

The present invention provides a kind of neural network model compression method and devices, wherein this method includes：Training data is inputted into neural network model and target nerve network model to be compressed；The feature vector and classification results extracted to training data based on neural network model to be compressed, are trained target nerve network model, obtain compression neural network model；Wherein, the quantity of target nerve network model parameter is less than the quantity of neural network model parameter to be compressed.The feature vector and classification results that the embodiment of the present invention extracts training data based on neural network model to be compressed, guiding target neural network model is trained, finally obtained compression neural network model and neural network model to be compressed are identical to the classification results of same training data, into the loss without causing precision during model compression, it can be under the premise of ensureing precision, the size of model is compressed, the dual requirements for precision and moulded dimension are met.

Description

A kind of neural network model compression method and device

Technical field

The present invention relates to machine learning techniques field, in particular to a kind of neural network model compression method and Device.

Background technology

Fast development with neural network in fields such as image, voice, texts has pushed a series of falling for intellectual products Ground.In order to allow the feature of the more preferable learning training data of neural network with lift scheme effect, mutually it is applied to indicate neural network mould The parameter of type increases rapidly, and the number of plies of neural network is continuously increased, and leads to deep neural network model there is parameters numerous, mould Type training and the computationally intensive deficiency of application process；This causes the product based on neural network to rely on server end operation energy mostly The driving of power causes the application range of neural network model to be restricted highly dependent upon good running environment and network environment, Such as it cannot achieve Embedded Application.In order to realize the Embedded Application of neural network model, need neural network model Below volume compression to a certain range.

Current model compression method generally comprises following several：First, beta pruning, namely after the complete large-sized model of training, go The parameter for falling weight very little in network model then proceedes to be trained model；Second, reaching reduction ginseng by the way that weights are shared The purpose of number quantity；Third, quantization, it is however generally that, the floating type number for the 32bit length that the parameter of neural network model is all It indicates, need not actually retain so high precision, can indicate original 32 bit institutes by quantization, such as with 0~255 The precision of expression reduces the space occupied required for each weights by sacrificing precision.Fourth, neural network binaryzation, Also the parameter of network model is used into binary number representation, to achieve the purpose that reduce model size.

But above-mentioned several method is all that model compression is directly directly carried out on model to be compressed, and to sacrifice model Model compression is carried out premised on precision, is often unable to reach the use demand to precision.

Invention content

In view of this, the embodiment of the present invention is designed to provide a kind of neural network model compression method and device, The size of model can be compressed in the case where ensureing neural network model precision.

In a first aspect, an embodiment of the present invention provides a kind of neural network model compression method, this method includes：

Training data is inputted into neural network model and target nerve network model to be compressed；

The feature vector and classification results that the training data is extracted based on the neural network model to be compressed, to mesh Mark neural network model is trained, and obtains compression neural network model；

Wherein, the quantity of the target nerve network model parameter is less than the number of the neural network model parameter to be compressed Amount.

Second aspect, the embodiment of the present invention also provide a kind of neural network model compression set, which includes：

Input module, for training data to be inputted neural network model and target nerve network model to be compressed；

Training module, feature vector for extract to the training data based on the neural network model to be compressed with Classification results are trained target nerve network model, obtain compression neural network model；

Neural network model compression method and device provided by the embodiments of the present application, to neural network model to be compressed When compression, the quantity of advance one parameter of framework of meeting is less than the target of the quantity of neural network model parameter to be compressed Then training data is input in neural network model and target nerve network model to be compressed by neural network, based on waiting for The feature vector and classification results, guiding target neural network model that compression neural network model extracts training data are instructed Practice, obtains compression neural network model, finally obtained compression neural network model and neural network model to be compressed are to same The classification results of training data should be identical, and then the loss of precision will not be caused during model compression, thus The size of model can be compressed under the premise of ensureing precision, meets the dual requirements for precision and moulded dimension.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.

Description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the flow chart for the neural network model compression method that the embodiment of the present application one provides；

Fig. 2 shows a kind of being divided training data based on neural network model to be compressed of the offer of the embodiment of the present application two Class as a result, the specific method that target nerve network model is trained flow chart；

Fig. 3 shows a kind of model compression process schematic that the embodiment of the present application two provides；

Fig. 4 shows the embodiment of the present application three provides first flow chart for comparing operation；

Fig. 5 shows that the embodiment of the present application four also provides similar to first eigenvector and the progress of second feature vector Degree matches, and carries out the specific method flow chart of this training in rotation to target nerve network according to the result of similarity mode；

Fig. 6 shows that the similarity that the embodiment of the present application four provides determines the flow chart of operation；

Fig. 7 show that the embodiment of the present application five provides another to first eigenvector and second feature vector into Row similarity mode, and the flow of the specific method of this training in rotation is carried out according to the result of similarity mode to target nerve network Figure；

Fig. 8 shows that the similarity that the embodiment of the present application five provides determines the flow chart of operation；

Fig. 9 shows the flow chart for the neural network model compression method that the embodiment of the present application six provides；

Figure 10 shows the structural schematic diagram for the neural network model compression set that the embodiment of the present application seven provides；

Figure 11 shows a kind of structural schematic diagram for computer equipment that the embodiment of the present application eight provides.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention Middle attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is a part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can be arranged and designed with a variety of different configurations.Therefore, below to provide in the accompanying drawings the present invention The detailed description of embodiment is not intended to limit the range of claimed invention, but is merely representative of the selected reality of the present invention Apply example.Based on the embodiment of the present invention, the institute that those skilled in the art are obtained without making creative work There is other embodiment, shall fall within the protection scope of the present invention.

For ease of understanding the present embodiment, first to a kind of neural network model pressure disclosed in the embodiment of the present invention Contracting method describes in detail, and this method can be used for the compression of the size to various neural network models.

Neural network model compression method shown in Figure 1, that the embodiment of the present application one provides, including：

S101：Training data is inputted into neural network model and target nerve network model to be compressed.

When specific implementation, neural network model to be compressed is the larger neural network model of volume, and has been led to Cross what training data was trained, the neural network mould that neural network ensembles be made of single Neural or multiple are constituted Type.For target nerve network model, with more parameter.Here parameter may include neural network Feature extraction layer the number of plies and/or the parameter involved in every layer of feature extraction layer.

Therefore, it in order to compress neural network model to be compressed, needs training data being input to nerve to be compressed In network model, the feature of training data is learnt using network model to be compressed, is realized to neural network mould to be compressed The training of type obtains the neural network model to be compressed for completing training, and the neural network model to be compressed of completion training is made To need the neural network model compressed.

Target nerve network model is then the good neural network model of advance framework, compared with neural network model to be compressed With less parameter, for example, the number of plies with less feature extraction layer, simpler neural network structure, feature extraction layer With the less parameter of quantity.

Herein, it should be noted that if the neural network model to be compressed is trained using unsupervised training method It arrives, then training data is no label；If the neural network model to be compressed is to use to have the training method of supervision to obtain, instruct Practicing data has label；If the neural network model to be compressed is obtained using transfer learning training method, training data Can also be no label either there is label.

S102：The feature vector and classification results that training data is extracted based on neural network model to be compressed, to target Neural network model is trained, the neural network model compressed；

When specific implementation, training data is inputted into neural network model and target nerve network mould to be compressed Type, is used to classification results of the network model to be compressed to training data, the training of guiding target neural network model, and During being trained to target nerve network model, it is allowed to the classification results to training data, being compressed with band as possible Neural network model is close to the classification results of training data.

Neural network model compression method provided by the embodiments of the present application, compresses to neural network model to be compressed When, the quantity of advance one parameter of framework of meeting is less than the target nerve net of the quantity of neural network model parameter to be compressed Then training data is input in neural network model and target nerve network model to be compressed by network, be based on god to be compressed The feature vector and classification results, guiding target neural network model extracted to training data through network model are trained, obtain To compression neural network model, this process is operated on band compression neural network model, and finally obtained Compression neural network model and neural network model to be compressed should be to the classification results of same training data it is identical, in turn The loss of precision will not be caused during model compression, it is thus possible under the premise of ensureing precision, to the size of model It is compressed, meets the dual requirements for precision and moulded dimension.

Specifically, network model to be compressed generally comprises：Neural network to be compressed and grader to be compressed.Target nerve net Network model generally comprises：Target nerve network and object classifiers；The obtained compression neural network model of training includes：Compression god Through network and compression grader.

Shown in Figure 2, the embodiment of the present application two also provides one kind based on neural network model to be compressed to training data Classification results, to the specific method that target nerve network model is trained, including：

S201：First eigenvector is extracted using the training data that neural network to be compressed is input, and uses target god Through the training data extraction second feature vector that network is input.

S202：Similarity mode is carried out to first eigenvector and second feature vector, and according to similarity mode As a result epicycle training is carried out to target nerve network；

S203：First eigenvector is input to grader to be compressed, obtains the first classification results；

Second feature vector is input to object classifiers, obtains the second classification results；

S204：According to the comparison result of the first classification results and the second classification results, to target nerve network and target Grader carries out epicycle training；

S205：By carrying out more wheel training to target nerve network and object classifiers, compression neural network mould is obtained Type.

When specific implementation, in model compression process schematic shown in Figure 3, for convenience to the application reality It applies example to be described, introduces two function modules, similarity mode module and comparing module in this embodiment.Wherein, similar Matching module is spent to be used to first eigenvector and second feature vector carrying out similarity mode；Comparing module is used for first point Class result and the second classification results are compared.

Training data is input to neural network model and target nerve network model to be compressed.Training data is input to After neural network model to be compressed, two processes can be executed, first, neural network to be compressed can carry out feature to training data Extraction, obtains the first eigenvector of training data；Then first eigenvector is transmitted to grader to be compressed, to be compressed point Class device is based on first eigenvector, classifies to the training data of first eigenvector characterization, obtains the first classification results.

Similarly, training data is input to after target nerve network model, can also execute two processes, first, target Neural network carries out feature extraction to training data, obtains the second feature vector of training data；Then by second feature vector Object classifiers are transmitted to, object classifiers are based on second feature vector, are carried out to the training data of second feature vector characterization Classification, obtains the second classification results.

To the process that neural network model to be compressed is compressed, actually realizes and pass through neural network mould to be compressed The training of type guiding target neural network model so that the compression neural network model and neural network mould to be compressed that training obtains The result that type classifies to same training data is consistent, that is, neural network to be compressed and compression neural network are to same instruction When practicing data and carrying out feature extraction, similarity between obtained feature vector will be as close as；Meanwhile to be compressed point Class device and compression grader be based respectively on as close possible to feature vector, when classifying to the training data that it is characterized, point The result of class is consistent.Thus, it, be to target nerve network and mesh when being trained to target nerve network model Mark grader is trained.

In the training process, the parameter of target nerve network can by first eigenvector and second feature vector it is similar The influence for spending matching result, according to similarity mode as a result, the parameter of adjustment target nerve network.Due to target nerve network With the difference of parameter in neural network to be compressed, first eigenvector is caused to be extremely difficult to second feature vector consistent.Thus It needs as possible so that the second feature vector that extract to training data of target nerve network being approached to first eigenvector as possible； Meanwhile the parameter of target nerve network also suffers from the of the training data classification that object classifiers characterize second feature vector The influence of two classification results will adjust the ginseng of target nerve network in the second classification results and inconsistent the first classification results Number so that the second classification results that object classifiers obtain are consistent with the first classification results.

The parameter of object classifiers can be influenced by the first classification results and the second classification results comparison result, first When classification results and the second classification record a demerit inconsistent, the parameter of object classifiers is adjusted so that the second classification results and first point Class result is consistent.

In turn, after training data to be inputted to neural network model and target nerve network model to be compressed, make first First eigenvector is extracted with the training data that neural network to be compressed is data, and uses the instruction that target nerve network is input Practice data extraction second feature vector, the first eigenvector of same training data and second feature vector are then transmitted to phase Like degree matching module, similarity mode is carried out to first eigenvector and second feature vector using similarity mode module, and Epicycle training is carried out to target nerve network according to similarity mode；Meanwhile first eigenvector is input to classification to be compressed Device obtains the first classification results, and second feature vector is input to object classifiers, obtains the second classification results, then will First classification results and the second classification results are transmitted to comparing module, are classified using the first classification results of comparing module pair and second As a result it is compared, and according to comparison as a result, carrying out epicycle training to target nerve network and object classifiers.

By carrying out more wheel training to target nerve network and object classifiers, compression neural network model is obtained.

It is noted herein that epicycle training refers to being instructed to target nerve network model using same training data Practice, until target nerve network carries out training data the second feature vector that feature extraction obtains, and is classified to obtain The second classification results be satisfied by preset condition；More wheel training refer to, using multiple training datas to target nerve network into Row training, each training data carry out a wheel to target nerve network and train.

Specifically, the embodiment of the present application three also provides a kind of comparison knot according to the first classification results and the second classification results Fruit carries out target nerve network and object classifiers the specific method of epicycle training, including：It executes following first and compares behaviour Make, until the Classification Loss of target nerve network model meets default loss range, completes to target nerve network and target The epicycle of grader is trained.

Shown in Figure 4, first, which compares operation, includes：

S401：It compares the first classification results and whether the second classification results is consistent；If it is, jumping to S402；If It is no, then jump to S403.

S402：It completes to train the epicycle of target nerve network and object classifiers；This flow terminates.

S403：Generate the first feedback information, and based on the first feedback information to target nerve network and object classifiers into Row parameter adjustment；

S404：The use of target nerve network and object classifiers is that training data determines new based on the parameter after adjustment The second classification results, and execute S401 again.

When specific implementation, it is ensured that target nerve network model after excessive training in rotation white silk, obtained compression is refreshing Precision through network model will ensure to compress neural network model and neural network model to be compressed to same training data Classification results are consistent.Therefore, for the first classification results and the second classification results are compared with comparing module.When than When inconsistent to result, the first feedback information is generated, is based on the first feedback information, to target nerve network and object classifiers Parameter is adjusted, and has been adjusted target nerve network and object classifiers after parameter；It reuses after having adjusted parameter Target nerve network and object classifiers determine the second new classification results for training data, then based on the first classification results and newly The second classification results carry out it is above-mentioned first compare operation, repeat the above process, until the first classification results and second classification knot Fruit is consistent.

In addition, shown in Figure 5, the embodiment of the present application four also provide it is a kind of to first eigenvector and second feature to Amount carries out similarity mode, and carries out the specific method of this training in rotation to target nerve network according to the result of similarity mode, packet It includes：

S501：First eigenvector and second feature vector are clustered respectively；

S502：According to first eigenvector clustered as a result, generate the first adjacency matrix；According to second feature It is that vector is clustered as a result, generate the second adjacency matrix；

S504：According to the similarity between the first adjacency matrix and the second adjacency matrix, to the parameter of target network into Row epicycle is trained.

When specific implementation, first eigenvector can be regarded to the point being mapped in higher dimensional space as, according to point The distance between point respectively clusters these points, distance is divided into the point within predetermined threshold value in same class, so Afterwards according to cluster as a result, forming the first adjacency matrix of distance between point-to-point.

In the first adjacency matrix, if two points belong to same class in cluster, distance between the two is 1；Such as Two points of fruit are not belonging to same class in cluster, then the distance between 2 points are 0.

For example, training data has 5, obtained first eigenvector is respectively：1、2、3、4、5.Wherein, to fisrt feature The result that vector is clustered is：{ 1,3 }, { 2 }, { 4,5 }, the then adjacency matrix formed are：

According to second feature vector clustered as a result, to form the second adjacency matrix similar to the above, therefore no longer It repeats.

The embodiment of the present application five also provides a kind of similarity according between the first adjacency matrix and the second adjacency matrix, To the method that the parameter of target network carries out epicycle training, this method includes：It executes following similarity and determines operation, until first Similarity between adjacency matrix and the second adjacency matrix is less than preset first similarity threshold, completes to target nerve network Epicycle training；

Shown in Figure 6, similarity determines that operation includes：

S601：Compare whether the similarity between the first adjacency matrix and the second adjacency matrix is less than preset first phase Like degree threshold value.If it is, executing S602；If it is not, then carrying out S603.

Herein, it when specific implementation, is calculating between currently available the first adjacency matrix and the second adjacency matrix Similarity when, calculate the first adjacency matrix mark and the second adjacency matrix mark, the mark of the first adjacency matrix and second adjoining The distance between mark of matrix is closer, then the similarity between the first adjacency matrix and the second adjacency matrix is higher.To first It, can be by the mark of the first adjacency matrix and when the distance between the mark of the mark of adjacency matrix and the second adjacency matrix is solved Difference between the mark of two adjacency matrix as between the first adjacency matrix and the second adjacency matrix similarity namely the first adjoining Absolute value of the difference between the mark of matrix and the mark of the second adjacency matrix is bigger, the phase of the first adjacency matrix and the second adjacency matrix It is lower like spending.

S602：It completes to train the epicycle of target nerve network.This flow terminates.

S603：The first feedback information is generated, and parameter adjustment is carried out to target nerve network based on the first feedback information；

S604：The use of target nerve network is that training data extracts new second feature vector based on the parameter after adjustment； New second feature vector is clustered, generates the second new adjacency matrix, and execute S601 again.

When specific implementation, since the similarity between the first adjacency matrix and the second adjacency matrix is higher, then the The classification results classified to first eigenvector and the second adjacency matrix of one adjacency matrix characterization characterize special to second The classification results that sign vector is classified are more similar, therefore will be according to similar between the first adjacency matrix and the second adjacency matrix Degree carries out parameter adjustment to target nerve network so that target nerve network is carrying out what feature extraction obtained to training data Second feature vector is become closer in the first spy obtained to training data progress feature extraction using neural network to be compressed Sign vector.

In addition, shown in Figure 7, it is special to first eigenvector and second that the embodiment of the present application five also provides another Sign vector carries out similarity mode, and carries out the specific side of this training in rotation to target nerve network according to the result of similarity mode Method, this method include：

S701：The operation for first eigenvector and second feature vector reduce dimension respectively, obtains fisrt feature Second dimensionality reduction feature vector of the first dimensionality reduction feature vector and second feature vector of vector.

When specific implementation, to first eigenvector and second feature vector reduce the operation of dimension, it can be with By carrying out recompiling acquisition to first eigenvector and second feature vector, such as using a full articulamentum, to first Feature vector and second feature vector carry out a Feature capturing again, obtain the first dimensionality reduction feature vector and the second dimensionality reduction feature Vector.

S702：Calculate the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector.

Herein, it when calculating the similarity between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector, may be used The difference between two vectors is calculated, and using the difference between two as the result of similarity.Alternatively, can be directly in the first drop Dimensional feature vector and the second dimensionality reduction feature vector will carry out the result of subtraction as similar into the subtraction between row element and element The result of degree；Alternatively, can also regard the first dimensionality reduction feature vector and the second dimensionality reduction feature vector a little as, corresponding sky is projected Between in, calculate point distribution between difference.For example, the first dimensionality reduction feature vector and the second dimensionality reduction feature vector are projected correspondence Space in, obtained point is respectively：S(X₁, Y₁, Z₁), M (X₂, Y₂, Z₂) by the distance between two points L=(X₁-X₂)²+ (Y₁-Y₂)²+(Z₁-Z₂)²Similarity as the two；Apart from smaller, similarity is bigger.

S703：According to the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector to the parameter of target network into Row epicycle is trained.

Herein, according to the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector to the parameter of target network into Row epicycle is trained, and is actually to ensure that the similarity between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector default The second similarity threshold within.Specifically

Following similarity can be executed and determine operation, until the phase of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector It is less than preset second similarity threshold like degree, completes to train the epicycle of target nerve network.

Shown in Figure 8, similarity determines that operation includes：

S801：It is default whether the similarity between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector is less than by comparison The second similarity threshold；If it is, executing S302；If it is not, then executing S803.

Herein, the similarity calculating method between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector may refer to The description of S702 is stated, details are not described herein.

S802：It completes to train the epicycle of target nerve network.This flow terminates.

S803：The second feedback information is generated, and parameter adjustment is carried out to target nerve network based on the second feedback information.

S804：The use of target nerve network is that training data extracts new second feature vector based on the parameter after adjustment. The operation for new second feature vector reduce dimension, generates the second new dimensionality reduction feature vector, and execute S801 again.

Specifically, it is ensured that first eigenvector and second feature vector are as close as the phase it is necessary to both make It is less than certain threshold value like degree, namely the similarity between the first dimensionality reduction feature vector and the second dimensionality reduction feature vector is less than Preset second similarity threshold.It is when similarity between the two is not less than preset second similarity threshold, i.e., corresponding The second feedback information is generated, and carries out the adjustment of parameter to target nerve network based on second feedback information so that target god Through network when being that training data extracts second feature vector again, it can be dropped towards the first dimensionality reduction feature vector and second is increased The direction change of similarity between dimensional feature vector.Then it is training number to reuse the target nerve network after having adjusted parameter It is vectorial according to new second feature is extracted, and reduction dimension operation is carried out to new second feature vector again, generate new second Dimensionality reduction feature vector, and execute similarity calculation operation again, until the first dimensionality reduction feature vector and the second dimensionality reduction feature to Similarity between amount is less than preset second similarity threshold.

The compression neural network model obtained using the embodiment of the present application one, it is ensured that compress the essence of neural network model It spends and is consistent with the precision of neural network model to be compressed；Unsupervised learning or transfer learning training method are obtained For network model to be compressed, if neural network model to be compressed is wrong to the classification of certain training data, in certain journey It is wrong to the classification of the training data that compression neural network model is also resulted on degree.The embodiment of the present application six also provides separately A kind of outer neural network model compression method can further increase the precision of compression neural network model.

It is shown in Figure 9, the embodiment of the present application six provide neural network model compression method, to first eigenvector with And before second feature vector carries out similarity mode, further include：

S901：Noise addition operation is carried out to first eigenvector.

When specific implementation, noise addition is carried out to first eigenvector, is to increase the compression that training obtains The generalization ability of neural network model.Generalization ability refers to the adaptability for referring to machine learning algorithm to fresh sample.To When one feature vector carries out noise addition operation, the multiple different degrees of noise of degree can be carried out to first eigenvector, Or the repeatedly addition of variety classes noise.The addition of noise each time can all generate fisrt feature that one is added to noise to Amount, each is added to the first eigenvector of noise, can all cause to a degree of offset of original first eigenvector, from And a training data is enable to obtain the first eigenvector of a variety of offsets.Meanwhile it can also enrich first eigenvector Data volume reduces the training data of input in the case where first eigenvector data volume is constant, data can be allowed preferably to intend It closes.In addition, neural network model to be compressed for certain training datas classification not necessarily very accurately, therefore, increase The mutation of first eigenvector may to be added to the first eigenvector after noise and more be intended to reality, realize to mesh The training of mark neural network model is preferably guided.

It is general to use construction with first eigenvector with identical when carrying out noise addition to first eigenvector Noise is added to by the noise vector of dimension in such a way that first eigenvector is added with noise vector corresponding position data In first eigenvector.

When construction has the noise vector of identical dimensional with first eigenvector, can directly construct, it can also be indirect Construction.Directly construction refer to directly generate with and noise vector of the first eigenvector with identical dimensional, for example, ought the When the dimension of one feature vector is 1 × 1000, the noise vector of construction is also 1 × 1000 dimension.Indirect configuration refers to generating dimension Less than the noise vector of first eigenvector, then by the way of to noise vector zero filling, generate dimension and fisrt feature to Measure identical noise vector；For example, when the dimension of first eigenvector is 1 × 1000, the intermediate noise vector of construction is 1 × 500；0 is filled out in any position of intermediate noise vector, ultimately forms the noise vector that dimension is 1 × 1000.

In addition, since repeatedly different degrees of noise can be carried out to first eigenvector, or repeatedly variety classes are made an uproar The addition of sound, the different noise of degree, the mode that parameter in change noise generation algorithm may be used obtain；Or using indirect The method for constructing noise vector is obtained in the method that different location fills out 0；Different types of noise, can be by changing noise life It is obtained at the mode of algorithm.

S902：The first eigenvector for being added to noise and second feature vector are subjected to similarity mode.

The method for being added to the first eigenvector and second feature vector progress similarity mode of noise, makes an uproar with being not added with The first eigenvector of sound is similar with the second feature vector progress method of similarity mode, specifically may refer to above-mentioned statement, Details are not described herein.

In addition, in this embodiment, due to being added to noise to first eigenvector, using grader to be compressed to addition When first eigenvector after noise is classified, the classification of classification results and original first eigenvector may result in As a result different, if not being modified to it, the precision of finally obtained compression neural network model can be caused to be affected.

Therefore, in the embodiment of the present application, carried out in the first eigenvector and second feature vector that will be added to noise Similarity mode will also execute following second while completion based on similarity mode result to the training of target nerve network Operation is compared, until the first classification results are consistent with the label of training data, is completed to neural network to be compressed and to be compressed The epicycle of grader is trained；

Second, which compares operation, includes：

The label of first classification results and training data is compared；

For the inconsistent situation of comparison result, scattered feedback information is generated, and based on scattered feedback information to be compressed Neural network and grader to be compressed carry out parameter adjustment；

The use of neural network to be compressed and grader to be compressed is that training data extracts newly based on the parameter after adjustment First classification results, and execute second again and compare operation.

The fine tuning to neural network model to be compressed may be implemented through the above steps, make neural network model to be compressed with And the compression neural network model that training obtains can have better generalization ability and higher precision.

Based on same inventive concept, god corresponding with neural network model compression method is additionally provided in the embodiment of the present invention Through network model compression set, the principle solved the problems, such as due to the device in the embodiment of the present invention and the above-mentioned god of the embodiment of the present invention It is similar through network model compression method, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.

Neural network model compression set shown in Figure 10, that the embodiment of the present invention seven provides, specifically includes：

Input module 11, for training data to be inputted neural network model and target nerve network model to be compressed；

First training module 12, feature vector for extract to training data based on neural network model to be compressed and is divided Class obtains compression neural network model as a result, be trained to target nerve network model；

Wherein, the quantity of target nerve network model parameter is less than the quantity of neural network model parameter to be compressed.

Neural network model compression set provided by the embodiments of the present application, compresses to neural network model to be compressed When, the quantity of advance one parameter of framework of meeting is less than the target nerve net of the quantity of neural network model parameter to be compressed Then training data is input in neural network model and target nerve network model to be compressed by network, be based on god to be compressed The feature vector and classification results, guiding target neural network model extracted to training data through network model are trained, obtain To compression neural network model, this process is operated on band compression neural network model, and finally obtained Compression neural network model and neural network model to be compressed should be to the classification results of same training data it is identical, in turn The loss of precision will not be caused during model compression, it is thus possible under the premise of ensureing precision, to the size of model It is compressed, meets the dual requirements for precision and moulded dimension.

Optionally, further include：Second training module 13, for by training data input neural network model to be compressed with And before target nerve network model, training data is inputted into neural network model to be compressed, to neural network model to be compressed It is trained, obtains the neural network model to be compressed for completing training.

Optionally, neural network model to be compressed includes：Neural network to be compressed and grader to be compressed；Target nerve net Network model includes：Target nerve network and object classifiers；

First training module 12, is specifically used for：It is special using the training data extraction first that neural network to be compressed is input Sign vector, and use the training data extraction second feature vector that target nerve network is input；

Similarity mode is carried out to first eigenvector and second feature vector, and according to the result pair of similarity mode Target nerve network carries out epicycle training；And

First eigenvector is input to grader to be compressed, obtains the first classification results；

According to the comparison result of the first classification results and the second classification results, to target nerve network and object classifiers Carry out epicycle training；

Optionally, the first training module 12 is specifically used for comparing operation by executing following first, until target nerve net The Classification Loss of network model meets default loss range, completes to train the epicycle of target nerve network and object classifiers；

First, which compares operation, includes：

First classification results and the second classification results are compared；

For the inconsistent situation of comparison result, the first feedback information is generated, and based on the first feedback information to target god Parameter adjustment is carried out through network and object classifiers；

The use of target nerve network and object classifiers is that training data determines new second based on the parameter after adjustment Classification results, and execute first again and compare operation.

Optionally, the first training module 12 is additionally operable to：Similarity is carried out to first eigenvector and second feature vector Before matching, noise addition operation is carried out to first eigenvector；The first eigenvector and second feature of noise will be added to Vector carries out similarity mode.

Optionally, the first training module 12 be specifically used for by describe step to first eigenvector and second feature to Amount carries out similarity mode, and carries out epicycle training to target nerve network according to the result of similarity mode：Respectively to first Feature vector and second feature vector are clustered；

According to first eigenvector clustered as a result, generate the first adjacency matrix；

According to second feature vector clustered as a result, generate the second adjacency matrix；

According to the similarity between the first adjacency matrix and the second adjacency matrix, epicycle is carried out to the parameter of target network Training.

Optionally, the first training module 12 is specifically used for determining operation by executing following similarity, until the first adjoining Similarity between matrix and the second adjacency matrix is less than preset first similarity threshold, completes the sheet to target nerve network Wheel training；

Similarity determines that operation includes：

Calculate the similarity between currently available the first adjacency matrix and the second adjacency matrix；

The case where being not less than preset first similarity threshold for similarity, generates the first feedback information, and based on the One feedback information carries out parameter adjustment to target nerve network；

The use of target nerve network is that training data extracts new second feature vector based on the parameter after adjustment；

New second feature vector is clustered, generates the second new adjacency matrix, and execute similarity calculation again Operation.

Optionally, the first training module 12 be specifically used for by following step be first eigenvector and second feature to Amount carries out similarity mode, and carries out epicycle training to target nerve network according to the result of similarity mode：

The operation for first eigenvector and second feature vector reduce dimension respectively, obtains first eigenvector Second dimensionality reduction feature vector of the first dimensionality reduction feature vector and second feature vector；

Calculate the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector；

This is carried out to the parameter of target network according to the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector Wheel training.

Optionally, the first training module 12 is specifically used for executing following similarity determination operation, until the first dimensionality reduction feature The similarity of vector sum the second dimensionality reduction feature vector is less than preset second similarity threshold, completes the sheet to target nerve network Wheel training；

Similarity determines that operation includes：

Calculate the similarity between currently available the first dimensionality reduction feature vector and the second dimensionality reduction feature vector；

The case where being not less than preset second similarity threshold for similarity, generates the second feedback information, and based on the Two feedback informations carry out parameter adjustment to target nerve network；

The operation for new second feature vector reduce dimension, generates the second new dimensionality reduction feature vector, and again Execute similarity calculation operation.

Corresponding to the neural network model compression method in Fig. 1, the embodiment of the present invention eight additionally provides a kind of computer and sets Standby, as shown in figure 11, which includes memory 1000, processor 2000 and is stored on the memory 1000 and can be at this The computer program run on reason device 2000, wherein above-mentioned processor 2000 realizes above-mentioned god when executing above computer program The step of through network model compression method.

Specifically, above-mentioned memory 1000 and processor 2000 can be general memory and processor, not do here It is specific to limit, when the computer program of 2000 run memory 1000 of processor storage, it is able to carry out above-mentioned neural network mould Type compression method carries out model compression to solve existing model compression method premised on the precision for sacrificing model, can not The problem of reaching to precision use demand, and then reach in the case where ensureing neural network model precision, to the size of model The effect compressed.

Corresponding to the neural network model compression method in Fig. 1, the embodiment of the present invention nine additionally provides a kind of computer can Storage medium is read, computer program is stored on the computer readable storage medium, when which is run by processor The step of executing above-mentioned neural network model compression method.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned neural network model compression method is able to carry out, to solve existing model compression Method carries out model compression premised on the precision for sacrificing model, the problem of being unable to reach to precision use demand, and then reaches In the case where ensureing neural network model precision, effect that the size of model is compressed.

The computer program product of neural network model compression method and device that the embodiment of the present invention is provided, including The computer readable storage medium of program code is stored, the instruction that program code includes can be used for executing previous methods embodiment In method, specific implementation can be found in embodiment of the method, details are not described herein.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description It with the specific work process of device, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

If function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store In a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially in other words to existing There is the part for the part or the technical solution that technology contributes that can be expressed in the form of software products, the computer Software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal meter Calculation machine, server or network equipment etc.) execute all or part of step of each embodiment method of the present invention.And it is above-mentioned Storage medium includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), magnetic disc or CD.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of neural network model compression method, which is characterized in that this method includes：

The feature vector and classification results that the training data is extracted based on the neural network model to be compressed, to target god It is trained through network model, obtains compression neural network model；

Wherein, the quantity of the target nerve network model parameter is less than the quantity of the neural network model parameter to be compressed.

2. according to the method described in claim 1, it is characterized in that, by training data input neural network model to be compressed with And before target nerve network model, further include：

The training data is inputted into the neural network model to be compressed, the neural network model to be compressed is instructed Practice, obtains the neural network model to be compressed for completing training.

3. according to the method described in claim 1, it is characterized in that, the neural network model to be compressed includes：God to be compressed Through network and grader to be compressed；The target nerve network model includes：Target nerve network and object classifiers；

The feature vector that the training data is extracted based on the neural network model to be compressed and classification results, to mesh Mark neural network model is trained, and is obtained compression neural network model, is specifically included：

First eigenvector is extracted using the training data that the neural network to be compressed is input, and uses the target nerve Network is the training data extraction second feature vector of input；

Similarity mode is carried out to the first eigenvector and second feature vector, and according to the result pair of similarity mode The target nerve network carries out epicycle training；And

The first eigenvector is input to the grader to be compressed, obtains the first classification results；

The second feature vector is input to the object classifiers, obtains the second classification results；

According to the comparison result of first classification results and second classification results, to the target nerve network and institute It states object classifiers and carries out epicycle training；

By carrying out more wheel training to the target nerve network and the object classifiers, compression neural network mould is obtained Type.

4. according to the method described in claim 3, it is characterized in that, executing following first compares operation, until target god Classification Loss through network model meets default loss range, completes to the target nerve network and the object classifiers Epicycle training；

Described first, which compares operation, includes：

First classification results and second classification results are compared；

For the inconsistent situation of comparison result, the first feedback information is generated, and based on first feedback information to the mesh It marks neural network and the object classifiers carries out parameter adjustment；

The use of target nerve network and object classifiers is that the training data determines new second based on the parameter after adjustment Classification results, and execute described first again and compare operation.

5. according to the method described in claim 3, it is characterized in that, described to the first eigenvector and described second special Before sign vector carries out similarity mode, further include：

Noise addition operation is carried out to the first eigenvector；

It is described that similarity mode is carried out to the first eigenvector and the second feature vector, it specifically includes：

The first eigenvector for being added to noise and the second feature vector are subjected to similarity mode.

6. according to the method described in claim 3-5 any one, which is characterized in that it is described to the first eigenvector and Second feature vector carries out similarity mode, and carries out this training in rotation to the target nerve network according to the result of similarity mode Practice, specifically includes：

The first eigenvector and the second feature vector are clustered respectively；

According to the first eigenvector clustered as a result, generate the first adjacency matrix；

According to the second feature vector clustered as a result, generate the second adjacency matrix；

According to the similarity between first adjacency matrix and the second adjacency matrix, the parameter of the target network is carried out Epicycle is trained.

7. according to the method described in claim 6, it is characterized in that, executing following similarity determines operation, until the first adjoining Similarity between matrix and the second adjacency matrix is less than preset first similarity threshold, completes to the target nerve network Epicycle training；

The similarity determines that operation includes：

The case where being not less than preset first similarity threshold for similarity, generates the first feedback information, and based on described the One feedback information carries out parameter adjustment to the target nerve network；

The use of target nerve network is that the training data extracts new second feature vector based on the parameter after adjustment；

New second feature vector is clustered, generates the second new adjacency matrix, and execute the similarity calculation again Operation.

8. according to the method described in claim 3-5 any one, which is characterized in that it is described to the first eigenvector and Second feature vector carries out similarity mode, and carries out this training in rotation to the target nerve network according to the result of similarity mode Practice, specifically includes：

The operation for first eigenvector and second feature vector reduce dimension respectively, obtains the first of first eigenvector Second dimensionality reduction feature vector of dimensionality reduction feature vector and second feature vector；

According to the similarity of the first dimensionality reduction feature vector and the second dimensionality reduction feature vector to the ginseng of the target network Number carries out epicycle training.

9. according to the method described in claim 8, it is characterized in that, executing following similarity determines operation, until the first dimensionality reduction The similarity of feature vector and the second dimensionality reduction feature vector is less than preset second similarity threshold, completes to the target The epicycle of neural network is trained；

The similarity determines that operation includes：

The case where being not less than preset second similarity threshold for similarity, generates the second feedback information, and based on described the Two feedback informations carry out parameter adjustment to the target nerve network；

The operation for new second feature vector reduce dimension, generates the second new dimensionality reduction feature vector, and execute again The similarity calculation operation.

10. a kind of neural network model compression set, which is characterized in that the device includes：

Training module, the feature vector for being extracted to the training data based on the neural network model to be compressed and classification As a result, being trained to target nerve network model, compression neural network model is obtained；