CN116244643A

CN116244643A - Training method, device, equipment and storage medium of multi-label classification model

Info

Publication number: CN116244643A
Application number: CN202111459448.6A
Authority: CN
Inventors: 刘光辉; 赵国庆; 权佳成
Original assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Current assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2023-06-09

Abstract

The invention discloses a training method, a device, equipment and a storage medium of a multi-label classification model, which comprise the following steps: respectively inputting the text vector of the acquired training sample into a first training network and a second training network of the current training model to obtain a first probability and a second probability of the training sample; according to the first probability and the second probability, crazy respectively calculates a first loss value and a second loss value corresponding to the first training network; determining a third loss value of the total model according to the first loss value, the second loss value, a first weight value of a preset first loss value and a second weight value of a preset second loss value; if the third loss value meets the preset condition, the current training model is used as a multi-label classification model; if the third loss value does not meet the preset condition, parameter adjustment is performed on the current training model according to the third loss value, and next training is performed, so that the number of tags which are irrelevant or have low correlation is reduced to be calculated, the model training speed is increased, and the model classification accuracy is improved.

Description

Training method, device, equipment and storage medium of multi-label classification model

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a training method, device, equipment and storage medium of a multi-label classification model.

Background

Multi-labels, i.e. multi-label, means that a sample may belong to multiple classes, i.e. multiple labels, simultaneously. Such as an L-size cotton garment, the sample has at least two labels-model: l, type: and (5) winter dress. The machine learning model is trained by utilizing the multi-label sample to obtain a multi-label classification model, the model can identify a target, and the output identification result can be a vector for expressing the category to which the target belongs.

At present, a plurality of multi-label learning algorithms can be divided into two main categories according to the problem solving angle, namely a method based on problem transformation and a method applicable to the algorithm.

The problem transformation method is to transform problem data, and use the existing algorithm, for example, consider each label in the multi-label as a single label without considering the algorithm of relevance, implement the common classification algorithm to each label, specifically, do classification to each type of label in the model of traditional machine learning, can use SVM, DT, native Bayes, DT, xgboost, etc. algorithm; in deep learning, a text classification model (e.g., textCNN, textRNN, etc.) is trained for each class; in addition to looking the tags apart, there is also looking the tags together (Label Powerset). In this regard, we turn the problem into a multi-class problem, a multi-class classifier being trained on all unique combinations of labels found in the training data.

The two methods suitable for algorithm are that pointers expand a specific algorithm, so that multi-label data can be processed, the algorithm is improved, and suitable data such as KNN multi-label version MLkNN, multi-label version Rank-SVM of SVM and the like are used. In deep learning, the output layer of the multi-classification model is often modified to adapt it to the classification of the multi-labels.

However, when the model is trained, the correlation between the label and the text often plays an important role, the existing model mostly ignores the relation, the label which is often irrelevant to the text or has low correlation also calculates the probability of the label, obviously the probability is unnecessary, not only the training error is improved, but also the accuracy of model classification is reduced.

Disclosure of Invention

The invention mainly aims to provide a training method, device, equipment and storage medium for a multi-label classification model, so as to solve the technical problems of large training error and low model classification accuracy of the multi-label classification model in the prior art.

In view of the above problems, the present invention provides a training method for a multi-label classification model, including:

acquiring a text vector of a training sample;

respectively inputting the text vector into a first training network and a second training network of the current training model to obtain a first probability and a second probability of the training sample; the first training network is a network without an attention mechanism, and the second training network is a network with an attention mechanism;

calculating a first loss value corresponding to the first training network according to the first probability and the actual probability of the training sample; calculating a second loss value corresponding to the second training network according to the second probability and the actual probability of the training sample;

determining a third loss value of the total model according to the first loss value, the second loss value, a preset first weight value of the first loss value and a preset second weight value of the second loss value;

if the third loss value meets a preset condition, the current training model is used as the multi-label classification model;

and if the third loss value does not meet the preset condition, carrying out parameter adjustment on the current training model according to the third loss value until the third loss value obtained by the next training model meets the preset condition.

Further, in the training method of the multi-label classification model, determining a third loss value of the total model according to the first loss value, the second loss value, a preset first weight value of the first loss value and a preset second weight value of the second loss value includes:

determining a first product of the first loss value and the first weight value, and determining a second product of the second loss value and the second weight value;

and taking the sum of the first product and the second product as the third loss value.

Further, in the above training method of the multi-label classification model, obtaining the text vector of the training sample includes:

and segmenting the text through the knots, and inputting the text into an embedding layer of the current training model to obtain the text vector.

Further, in the training method of the multi-label classification model, the first training network includes an LSTM network or a GRU network;

the second training network comprises a dynamic convolutional network.

Further, in the training method of the multi-label classification model, the dynamic convolution network includes a span-based dynamic convolution network.

The invention also provides a training device of the multi-label classification model, which comprises the following steps:

the acquisition module is used for acquiring the text vector of the training sample;

the training module is used for inputting the text vector into a first training network and a second training network of the current training model respectively to obtain a first probability and a second probability of the training sample; the first training network is a network without an attention mechanism, and the second training network is a network with an attention mechanism;

the first determining module is used for calculating a first loss value corresponding to the first training network according to the first probability and the actual probability of the training sample; calculating a second loss value corresponding to the second training network according to the second probability and the actual probability of the training sample;

the second determining module is used for determining a third loss value of the total model according to the first loss value, the second loss value, a preset first weight value of the first loss value and a preset second weight value of the second loss value;

the detection module is used for taking the current training model as the multi-label classification model if the third loss value meets a preset condition; and if the third loss value does not meet the preset condition, carrying out parameter adjustment on the current training model according to the third loss value until the third loss value obtained by the next training model meets the preset condition.

Further, in the training device for a multi-label classification model, the second determining module is specifically configured to:

Further, in the training device for a multi-label classification model, the obtaining module is specifically configured to:

The invention also provides training equipment of the multi-label classification model, which comprises a memory and a processor;

the memory has stored thereon a computer program which, when executed by a processor, implements the steps of the training method of the multi-label classification model as described in any of the above.

The invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the training method of the multi-label classification model as described in any of the above.

One or more embodiments of the above-described solution may have the following advantages or benefits compared to the prior art:

according to the training method, device, equipment and storage medium of the multi-label classification model, after the text vector of the training sample is obtained, training is carried out through the first training network of the network without the attention mechanism, and then the first loss value of the first training network is obtained. Through the second training network integrating the attention mechanism, under the condition of not increasing the network depth and the network width, the representation capability of the model is improved by means of attention aggregation of a plurality of convolution kernels, a second loss value of the second training network is obtained, then a third loss value of the total model is obtained based on the first loss value and the second loss value, the model is trained based on the third loss value, the number of labels which are irrelevant or have low relevance is reduced, the model training speed is accelerated, and the model classification accuracy is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention, without limitation to the invention. In the drawings:

FIG. 1 is a flowchart of an embodiment of a training method of a multi-label classification model according to the present invention;

FIG. 2 is a schematic diagram of a training device of the multi-label classification model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an embodiment of a training apparatus of the multi-label classification model of the present invention.

Detailed Description

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

The related technology can be classified by an ML-KNN algorithm, and the method is specifically as follows:

the main idea of the ML-KNN algorithm is that for each new instance, the k instances closest to it (the k instances in feature space that are the smallest in distance to it) can be obtained first, then the tag sets of these instances are obtained, after which the tag set of the new instance is determined by the maximum posterior probability criterion.

First we can get the label information of the corresponding k nearest neighbor instance using equation (1):

here, C _x Is a 1 n row vector whose element C _x (1) Refers to how many of the k neighbors of x for tag 1 have this tag.

For a new instance t, k neighbor index sets N (t) of the new instance t are obtained first, and an event is defined

T has tag 1, event->

For t no tag 1, define event +.>

(j ε {0,1, …, k }) is for tag l, j out of k neighbors contain this tag. Then based on vector C _t Can be obtained by maximum posterior probability criteria and bayesian criteria, as in equation (2):

wherein y is _t (1) I.e. the result we require, represents whether the t instance has an l tag.

Wherein the method comprises the steps of

The prior probability representing whether t has l labels can be found by dividing the number of times that l labels appear on the entire training set by the total number of labels. Reference is made to the formula (3)

I.e. the number of vectors in the sample that possess a label divided by the total number of vectors.

Posterior probability

Can be calculated according to the calculation formula (4):

wherein C [ j ]]J is equal to C _t (1) I.e. the number of labels i in the k-nearest neighbor of t.

C[δ]+1, n e {1,2,., m } means: if x _n The number of l tags in k-nearest neighbor of (a) is delta, and x _n With tag 1, then cdelta]+1，n∈{1，2，…，m}。

Then C j represents the number of vectors of which k neighbors have j 1 tags and which themselves have l tags, among all vectors.

Thereby the processing time of the product is reduced,

i.e. the vector has l labels, and k neighbors of the vector have the sum of the number of the vectors with 0 to m l labels.

Calculation of

Then we only need to see which case b= {0,1} maximizes this product value, if b=1, vector t has a l tag, otherwise there is no.

However, the algorithm has high computational complexity, high spatial complexity, low prediction accuracy for rare categories when the samples are unbalanced, and poor interpretability because the correlation between the labels and the text is not considered, and cannot give rules like decision trees.

The related technology can be classified by a TextCNN algorithm, and the method is specifically as follows:

structure of TextCNN:

embedding layer (embedding layer): textcnn uses pre-trained word vectors as the emmbedding layer. For all words in the dataset, we can get an embedding matrix MM, each row in MM being a word vector, because each word can be characterized as a vector.

Convolution layer (convolution and pooling): for each word, the following sentence is embedded into a matrix M, so that a word vector can be obtained. Assume that the word vectors have a total d-dimension. Then for this sentence the matrix a e Rs x d for s rows and d columns can be obtained. Convolutional neural networks are used to extract features. One-dimensional convolution, i.e. text convolution differs from image convolution in that the convolution is performed in only one direction (perpendicular) of the text sequence, the width of the convolution kernel being fixed to the dimension d of the word vector. The height is a super parameter and can be set. And carrying out convolution operation on each possible window of the sentence word to obtain a feature map.

Pooling (pooling): the feature sizes (feature maps) obtained by convolution kernels of different sizes are also different, so we use a pooling function for each feature map to make their dimensions the same.

Softmax layer: the results of max-pulling are spliced and fed into softmax, so that the probability of each category such as label being 1 and the probability of label being-1 are obtained.

However, the algorithm does not consider the relevance of the label and the text, so that the model is not strong in interpretation, and when the model is tuned, specific features are difficult to be adjusted in a targeted manner according to the training result, and as the textCNN has no concept similar to the feature importance (feature importance) in the gbdt model, the importance of each feature is difficult to evaluate.

Therefore, the algorithm of the prior art not only improves training errors, but also reduces the accuracy of model classification. In order to solve the technical problems, the invention provides the following technical scheme:

noun interpretation: LSTM, long and short term memory network, shows good effect in processing sequence text task, and it can well show global logic information and information complex time correlation each other in input text.

CNN, convolutional neural network, which is used for text task to extract the local characteristic information of sentence center word.

Fig. 1 is a flowchart of an embodiment of a training method of a multi-label classification model according to the present invention, as shown in fig. 1, the training method of the multi-label classification model of the present embodiment may specifically include the following steps:

100. acquiring a text vector of a training sample;

in one specific implementation, the text vector may be obtained by word segmentation of text by tie (Jieba) and input to the embedding layer of the current training model.

101. Respectively inputting the text vector into a first training network and a second training network of the current training model to obtain a first probability and a second probability of the training sample;

the first training network is a network without a converged attention mechanism, and the second training network is a network with a converged attention mechanism. For example, the first training network may comprise an LSTM network or a GRU network; the second training network may comprise a dynamic convolutional network. The dynamic convolution network may include a span-based dynamic convolution network.

Specifically, taking LSTM as an example, after obtaining the text vector of the training sample, the text vector of the training sample may be respectively input into the LSTM layer and the dynamic convolution layer, and then the classification probability is obtained through the softmax layer.

102. Calculating a first loss value corresponding to the first training network according to the first probability and the actual probability of the training sample; calculating a second loss value corresponding to the second training network according to the second probability and the actual probability of the training sample;

103. determining a third loss value of the total model according to the first loss value, the second loss value, a preset first weight value of the first loss value and a preset second weight value of the second loss value;

in particular, a first product of the first loss value and the first weight value may be determined, and a second product of the second loss value and the second weight value may be determined; and taking the sum of the first product and the second product as the third loss value.

104. Detecting whether the third loss value meets a preset condition; if yes, go to step 105, if no, go to step 106;

105. taking the current training model as the multi-label classification model;

and if the third loss value meets a preset condition, taking the current training model as the multi-label classification model. The preset condition may be that the third loss value is smaller than a preset threshold.

106. And carrying out parameter adjustment on the current training model according to the third loss value, and returning to the step 100.

And if the third loss value does not meet the preset condition, carrying out parameter adjustment on the current training model according to the third loss value, returning to the step 100, and continuing training until the third loss value obtained by the next training model meets the preset condition.

According to the training method of the multi-label classification model, after the text vector of the training sample is obtained, training is carried out through a first training network of a network without a fused attention mechanism, and then a first loss value of the first training network is obtained. Through the second training network integrating the attention mechanism, under the condition of not increasing the network depth and the network width, the representation capability of the model is improved by means of attention aggregation of a plurality of convolution kernels, a second loss value of the second training network is obtained, then a third loss value of the total model is obtained based on the first loss value and the second loss value, the model is trained based on the third loss value, the number of labels which are irrelevant or have low relevance is reduced, the model training speed is accelerated, and the model classification accuracy is improved.

It should be noted that, the method of the embodiment of the present invention may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the method of an embodiment of the present invention, and the devices interact with each other to complete the method.

Fig. 2 is a schematic structural diagram of an embodiment of a training apparatus for a multi-label classification model according to the present invention, as shown in fig. 2, the training apparatus for a multi-label classification model of the present embodiment may include an acquisition module 20, a training module 21, a first determination module 22, a second determination module 23, and a detection module 24.

An obtaining module 20, configured to obtain a text vector of the training sample;

specifically, the text vector may be obtained by word segmentation of the text by the tie and input to the embedding layer of the current training model.

The training module 21 is configured to input the text vector into a first training network and a second training network of the current training model, respectively, to obtain a first probability and a second probability of the training sample; the first training network is a network without an attention mechanism, and the second training network is a network with an attention mechanism;

a first determining module 22, configured to calculate a first loss value corresponding to the first training network according to the first probability and the actual probability of the training sample; calculating a second loss value corresponding to the second training network according to the second probability and the actual probability of the training sample;

wherein the first training network comprises an LSTM network or a GRU network; the second training network comprises a dynamic convolutional network. The dynamic convolution network includes a span-based dynamic convolution network.

A second determining module 23, configured to determine a third loss value of the total model according to the first loss value, the second loss value, a preset first weight value of the first loss value, and a preset second weight value of the second loss value;

The detection module 24 is configured to take the current training model as the multi-label classification model if the third loss value meets a preset condition; and if the third loss value does not meet the preset condition, carrying out parameter adjustment on the current training model according to the third loss value until the third loss value obtained by the next training model meets the preset condition.

According to the training device of the multi-label classification model, after the text vector of the training sample is obtained, training is carried out through the first training network of the network without the attention mechanism, and then a first loss value of the first training network is obtained. Through the second training network integrating the attention mechanism, under the condition of not increasing the network depth and the network width, the representation capability of the model is improved by means of attention aggregation of a plurality of convolution kernels, a second loss value of the second training network is obtained, then a third loss value of the total model is obtained based on the first loss value and the second loss value, the model is trained based on the third loss value, the number of labels which are irrelevant or have low relevance is reduced, the model training speed is accelerated, and the model classification accuracy is improved.

The device of the foregoing embodiment is configured to implement the corresponding method in the foregoing embodiment, and specific implementation schemes thereof may refer to the method described in the foregoing embodiment and related descriptions in the method embodiment, and have beneficial effects of the corresponding method embodiment, which are not described herein.

Fig. 3 is a schematic structural diagram of an embodiment of a training apparatus for a multi-label classification model according to the present invention, and as shown in fig. 3, the training apparatus for a multi-label classification model according to the present embodiment may include a memory 30 and a processor 31;

the memory 30 has stored thereon a computer program which, when executed by the processor 31, implements the steps of the method of dynamic weighting of a penalty function of the above-described embodiments.

The embodiment of the invention provides a storage medium, and the storage medium of the embodiment stores a computer program, and the computer program realizes the steps of the training method of the multi-label classification model of the embodiment when being executed by a controller.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present invention are disclosed above, the embodiments are only used for the convenience of understanding the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the present disclosure as defined by the appended claims.

Claims

1. A method of training a multi-label classification model, comprising:

acquiring a text vector of a training sample;

2. The method of claim 1, wherein determining a third loss value of the total model based on the first loss value, the second loss value, a first weight value of the first loss value and a second weight value of the second loss value, comprises:

3. The method of claim 1, wherein obtaining text vectors for training samples comprises:

4. The method of claim 1, wherein the first training network comprises an LSTM network or a GRU network;

the second training network comprises a dynamic convolutional network.

5. The method of claim 4, wherein the dynamic convolution network comprises a span-based dynamic convolution network.

6. A training device for a multi-label classification model, comprising:

7. The training device of the multi-label classification model according to claim 6, wherein the second determining module is specifically configured to:

8. The training device of the multi-label classification model according to claim 6, wherein the obtaining module is specifically configured to:

9. A training device for a multi-label classification model, comprising a memory and a processor;

stored on the memory is a computer program which, when executed by a processor, implements the steps of the training method of the multi-label classification model according to any one of claims 1 to 5.

10. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the training method of the multi-label classification model according to any of claims 1 to 5.