CN115168579A - Text classification method based on multi-head attention mechanism and two-dimensional convolution operation - Google Patents
Text classification method based on multi-head attention mechanism and two-dimensional convolution operation Download PDFInfo
- Publication number
- CN115168579A CN115168579A CN202210800916.XA CN202210800916A CN115168579A CN 115168579 A CN115168579 A CN 115168579A CN 202210800916 A CN202210800916 A CN 202210800916A CN 115168579 A CN115168579 A CN 115168579A
- Authority
- CN
- China
- Prior art keywords
- text
- layer
- neural network
- attention mechanism
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims description 89
- 238000013528 artificial neural network Methods 0.000 claims description 58
- 238000012549 training Methods 0.000 claims description 57
- 239000011159 matrix material Substances 0.000 claims description 47
- 238000007781 pre-processing Methods 0.000 claims description 31
- 230000004927 fusion Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000009849 deactivation Effects 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 description 6
- 238000013145 classification model Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which relates to the technical field of natural language processing.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation.
Background
Natural language processing is an important direction in the fields of computer science and artificial intelligence, researches various theories and methods capable of realizing effective communication between people and computers by using natural language, is a science integrating linguistics, computer science and mathematics, and has extremely wide application such as intelligent voice question-answering system, fraud short message identification, network comment emotion identification and the like.
In the medical field, clinical medical information is stored in information systems in a large amount in unstructured (or semi-structured) text form, and natural language processing is a key technology for extracting useful information from medical text. Through natural language processing, the unstructured medical texts are converted into structured data containing important medical information, and scientific research personnel can find useful medical information from the structured data, so that the operation quality of the medical system is improved, and the operation cost is reduced. In the era of rapid development of internet technology, the medical field faces no more information acquisition problems, but how to quickly and accurately acquire valuable information from massive information resources, and the generation modes of medical text information are various and rich, and the huge data volume makes manual distinction and arrangement difficult, so that how to effectively classify texts becomes very important.
At present, the commonly used text classification methods include a support vector machine, a convolutional neural network, a cyclic neural network, a BERT and the like, the BERT and the RNN can realize excellent classification effects, but the models are large, the training is difficult, and the application on a small host is difficult; the TextGCN achieves a good classification effect on a small model through a graph convolution technology, but for nodes which are not seen, the TextGCN cannot classify, a convolution neural network applied to a text is only one-dimensional, and if the text dimension of an input text is high, semantic information is lost by using the one-dimensional convolution neural network. The prior art discloses a self-adaptive text classification method and a self-adaptive text classification device based on BERT, firstly preprocessing corpus sample data to be classified, constructing a preset network model, then inputting the preprocessed sample data into the preset network model, performing supervised training by using a preset loss function to obtain a classification model, setting an output threshold of the classification model, obtaining the set classification model for text classification, setting the output threshold on the classification model to control the advanced output of a classification result, and shortening the model inference time without losing precision.
Disclosure of Invention
In order to solve the problems that a traditional text classification model adopted by the existing medical text classification has long training time and large calculated amount, the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which has small calculated amount and high training speed and also gives consideration to a good text classification effect.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a method of text classification based on a multi-head attention mechanism and a two-dimensional convolution operation, the method comprising the steps of:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
s3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
s5, forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and performing pre-training of text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, performing text classification training on the neural network by using the convolution operation characteristic vector, and adjusting the weight of a multi-attention machine mechanism layer to obtain the trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
Preferably, in step S2, the preprocessing operation performed on the texts in the training set includes:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
Preferably, in step S3, the built embedding layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the embedding layer has three layers, and the weight matrix of each layer is: w 1 、W 2 、W 3 The activation functions of all layers are sigmoid, the layers are connected in sequence, and after preprocessing operation is carried out on the texts in the training set, each text in all the texts is obtainedInputting the one-hot vector of each word into a neural network to obtain a word vector, wherein the calculation performed at each layer respectively comprises the following steps:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Denotes x 0 Intermediate value, x, after activation of the first layer 2 Denotes x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector.
Preferably, a plurality of self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, wherein the input of the self-attention mechanism is query and the dimension is d k Has a key and a dimension of d v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
Preferably, the self-vector component word vector matrix for each word obtained at S4 is represented as X = [ X ] 1 ,x 2 ,...,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and obtaining R text enhancement semantic representation X with the same size as X by setting R multi-head attention mechanism layers 1 ,X 2 ,...,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W 0
wherein,i denotes the order of the multi-head attention control layers of the R multi-head attention control layers, i =1, 2.., R; to X 1 ,X 2 ,...,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
Preferably, in step S6, the text is enhanced with a semantic representation X 1 ,X 2 ,...,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,...,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on a text-fused semantic representation, for the convolution layer, X 1 ,X 2 ,...,X R Setting the sizes of convolution kernels and the number of the convolution kernels as input R channels, wherein the size of the convolution kernels on a first dimension is equal to the length of a word vector; fusing semantics X for convolution kernel C and text s The element calculation formula of the convolution result matrix is as follows:
wherein X s (i, j, k) is input X s R (p, q) is an element in the convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in the convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
finally, convolution operation feature vectors are output, and loss of semantic information is avoided.
Preferably, when the convolutional operation feature vectors are used for text classification training of the neural network, in the full-connection layer, the weight of the multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding the pre-trained multi-attention machine mechanism layer into a new model class.
The present invention also provides a computer apparatus comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation as claimed in any one of claims 1 to 7.
The invention also proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.
The invention also provides a text classification system based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are connected in sequence;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and pre-training text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the text in the test set and inputting the preprocessed text into the trained neural network to obtain a classification result.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises the steps of firstly collecting a text data set to be classified, dividing the text data set into a training set and a testing set, carrying out preprocessing operation on the text in the training set, then constructing a neural network, inputting the text subjected to preprocessing operation into the neural network to obtain word vectors at a word granularity level, reflecting the importance degree of different Chinese characters in the text, then forming a multi-head attention mechanism layer, forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention mechanism layer to obtain a multi-dimensional text tensor, namely adopting a matching mode of fusing a pre-training word vector and the multi-head attention mechanism as semantic representation to obtain a text representation tensor, then carrying out two-dimensional convolution operation, extracting text characteristics and fusing different attention points of the multi-head attention mechanism; and introducing a full connection layer, performing text classification training on the neural network by using the convolution operation characteristic vector, adjusting the weight of the multi-attention machine mechanism layer to obtain a trained neural network, preprocessing the text concentrated in the test, and inputting the trained neural network to obtain a classification result. The method can obtain good classification effect and generalization capability on a smaller data set, is quick in fitting and less in model parameter, simplifies the model, reduces the overhead of the system, and effectively avoids the problems of large model data demand, long training time and high computer computing power requirement.
Drawings
Fig. 1 is a schematic flowchart of a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation according to embodiment 1 of the present invention;
fig. 2 is a schematic flowchart of a preprocessing operation performed on texts in a training set according to embodiment 1 of the present invention;
FIG. 3 is a view showing a structure of a neural network constructed in embodiment 1 of the present invention;
fig. 4 is a structural diagram showing a single self-attention mechanism proposed in embodiment 2 of the present invention;
FIG. 5 is a view showing a structure of a multi-headed attention mechanism layer proposed in embodiment 2 of the present invention;
fig. 6 is a diagram showing a structure of a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation proposed in embodiment 5 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;
it will be understood by those skilled in the art that certain descriptions of well-known structures in the drawings may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
example 1
As shown in fig. 1, the present embodiment provides a text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation, the method includes the following steps:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
referring to fig. 2, the preprocessing operations performed on the text in the training set include:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
The embodiment is programmed by adopting python language, and the used data set is a CMID data set which is a text classification data set in the medical field, wherein the data set comprises twenty-nine-hundred texts and 16 classification types. In the json format file, when a program reads a character, the character in the stop word list is automatically compared with the character in the stop word list, if the character is the stop word list, the character is automatically skipped, and finally the text is separated word by word in a Chinese character form, namely the text without the punctuation marks and the spaces is input character by character and stored in a list python data type; then, firstly, reading all the words in all the texts, establishing a one-hot vector for the words, and storing the one-hot vector in a database. The dimension of the one-hot vector is the number of all words in the database, the value of the vector is 1 in only one dimension, and the values of the other dimensions are 0. Each word has its own unique one-hot vector.
S3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected, and the structural diagram of the constructed neural network is shown in FIG. 3;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
the embedded layer of the constructed neural network is embedded by one-hot vectors at a character-by-character level in the textThe embedded layer has three layers for semantic representation, and the number of the neurons corresponding to each layer is respectively as follows: 3076. 1024, 768, wherein the weight matrix of each layer is respectively: w is a group of 1 、W 2 、W 3 The activation functions of all layers are sigmoid, and the formula of the sigmoid function is as follows:
the layers are sequentially connected, after preprocessing operation is carried out on the texts in the training set, one-hot vectors corresponding to all the characters in the texts are obtained, the one-hot vectors of all the characters are input into the neural network, character vectors are obtained, and calculation carried out on each layer is as follows:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Denotes x 0 Intermediate value, x, after activation of the first layer 2 Represents x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector, and finally, a 768-dimensional word vector is correspondingly output.
S5, forming a word vector matrix based on the word vectors, inputting the word vector matrix into a multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and performing text classification pre-training on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, performing text classification training on the neural network by using the convolution operation characteristic vector, and adjusting the weight of a multi-attention machine mechanism layer to obtain the trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
Table 1 shows that the method proposed in this embodiment is compared with other existing methods for training on the same text data set, and the classification effect that is not inferior to that of most models can be achieved in a shorter training time.
TABLE 1
Example 2
This embodiment is described with respect to a multi-head attention mechanism layer, in which a single self-attention mechanism is shown in fig. 4, multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, and the multi-head attention mechanism layer is shown in fig. 5, in this embodiment, the number of heads using three multi-head attention mechanism layers is 3, 6, and 9, respectively, the input of the self-attention mechanism is represented by query, and the dimension is d k Has a key and a dimension of d v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
The self-vector component word vector matrix of each word obtained in S4 is represented as X = [ X ] 1 ,x 2 ,...,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer parallel executionSplicing and mapping after line attention function, and obtaining R text enhanced semantic representation X with the same size as X by setting a total of R multi-head attention mechanism layers 1 ,X 2 ,...,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W 0
wherein,i denotes the order of the multi-head attention control layers of the R multi-head attention control layers, i =1, 2.., R; to X 1 ,X 2 ,...,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
Enhancing text semantic representation X 1 ,X 2 ,...,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,...,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X 1 ,X 2 ,...,X R Setting the sizes of convolution kernels and the number of the convolution kernels for R channels as input, and fusing semantics X for convolution kernel C and text s Suppose that the outputs after inputting three multi-head attention mechanisms are respectively: x 1 、X 2 、X 3 Then, it is called X = [ X ] 1 ,X 2 ,X 3 ]Is a fused semantic representation of text. X is a three-dimensional tensor, the shape of which is: (3, 30, 768). Performing a two-dimensional convolution operation, X, on the fused semantic representation of the text 1 、X 2 、X 3 The size of the convolution kernel is set as: (2, 3), the number of convolution kernels is 32, the size of each convolution kernel in the first dimension is equal to the length of each word vector, and the element calculation formula of the convolution result matrix is as follows:
wherein, X s (i, j, k) is input X s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
finally, a 32-bit convolution operation feature vector is output, and loss of semantic information is avoided. When the convolution operation characteristic vector is used for carrying out text classification training on the neural network, in a full-connection layer, the weight of a multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding a pre-trained multi-attention machine mechanism layer into a new model class.
Example 3
The embodiment provides a computer device, which comprises a processor, a memory and a computer program stored in the memory, wherein the processor executes the computer program stored in the memory to realize the text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation.
The memory may be a disk, a flash memory or any other non-volatile storage medium, and the processor is connected to the memory, and may be implemented as one or more integrated circuits, and may be specifically a microprocessor or a microcontroller, and when executing a computer program stored in the memory, the text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation is implemented for a global model.
Example 4
The present embodiment proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.
Example 5
As shown in fig. 6, the present embodiment proposes a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation, the system including:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and pre-training text classification on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the text in the test set and inputting the preprocessed text into the trained neural network to obtain a classification result.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation, the method comprising the steps of:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
s3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
s5, forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and performing pre-training of text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, carrying out text classification training on the neural network by using the convolution operation feature vectors, and adjusting the weight of the multi-attention machine control layer to obtain a trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
2. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 1, wherein in step S2, the preprocessing operation performed on the texts in the training set comprises:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
3. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 2, wherein in step S3, the built-in layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the built-in layer has three layers, and the weight matrix of each layer is respectively: w is a group of 1 、W 2 、W 3 The activation functions of all layers are sigmoid, all layers are connected in sequence, after the text in the training set is preprocessed, one-hot vectors corresponding to all characters in all the texts are obtained, the one-hot vectors of all the characters are input into the neural network, the character vectors are obtained, and the calculation performed on each layer is respectively as follows:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Represents x 0 Intermediate value, x, after activation of the first layer 2 Denotes x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector.
4. The method of claim 3 in which multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, the input of the self-attention mechanism being query with dimension d k Has a key and a dimension of d v In order to obtain the weight of the value, a group of query is regarded as a matrix Q, and the key and the value are respectively regarded as a matrix K and a matrix V, and based on the softmax function and the attention attribute function, the following are obtained:
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
5. The method of claim 4, wherein the self-vector of each word obtained in S4 is represented as X = [ X ] to form a word vector matrix by X = [ X = 1 ,x 2 ,…,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and setting a total of R multi-head attention mechanism layers to obtain an R text enhanced semantic representation X with the same size as X 1 ,X 2 ,…,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MulriHead(Q,K,V)=Concat(head 1 ,…,head h )W 0
wherein the head i =Attention(QW i Q ,KW i K ,VW i V ) I denotes the order of the multi-head attention mechanism layers of the R multi-head attention mechanism layers, i =1,2, \ 8230; to X 1 ,X 2 ,…,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
6. The multi-headed attention-based mechanism and two-dimensional convolution of claim 5Method for classifying an operating text, characterized in that in step S6 the text is enhanced with a semantic representation X 1 ,X 2 ,…,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,…,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X s Setting the size of a convolution kernel and the number of the convolution kernels as input, wherein the size of the convolution kernel in the first dimension is equal to the length of a word vector; for sizes of [768,vec2,vec3]Convolution kernel C and text fusion semantic X of s The element calculation formula of the convolution result matrix is as follows:
wherein, X s (i, j, k) is an input X s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
finally outputting the convolution operation characteristic vector.
7. The text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation as claimed in claim 6, wherein when performing text classification training on the neural network by using the feature vectors of the convolution operation, in the fully connected layer, the weights of the multi-attention mechanism layer are adjusted by a back propagation algorithm, and the pre-trained multi-head attention mechanism layer is added to a new model class by using a tensoflow packet.
8. A computer device comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-headed attention mechanism and two-dimensional convolution operations of any one of claims 1-7.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.
10. A text classification system based on a multi-head attention mechanism and two-dimensional convolution operations, the system comprising:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are connected in sequence;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and pre-training text classification on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the texts in the test set and inputting the texts into the trained neural network to obtain a classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210800916.XA CN115168579A (en) | 2022-07-08 | 2022-07-08 | Text classification method based on multi-head attention mechanism and two-dimensional convolution operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210800916.XA CN115168579A (en) | 2022-07-08 | 2022-07-08 | Text classification method based on multi-head attention mechanism and two-dimensional convolution operation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115168579A true CN115168579A (en) | 2022-10-11 |
Family
ID=83492736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210800916.XA Pending CN115168579A (en) | 2022-07-08 | 2022-07-08 | Text classification method based on multi-head attention mechanism and two-dimensional convolution operation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115168579A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116562284A (en) * | 2023-04-14 | 2023-08-08 | 湖北经济学院 | Government affair text automatic allocation model training method and device |
CN116660992A (en) * | 2023-06-05 | 2023-08-29 | 北京石油化工学院 | Seismic signal processing method based on multi-feature fusion |
CN117573869A (en) * | 2023-11-20 | 2024-02-20 | 中国电子科技集团公司第十五研究所 | Network connection resource key element extraction method |
CN118277538A (en) * | 2024-06-04 | 2024-07-02 | 杭州昊清科技有限公司 | Legal intelligent question-answering method based on retrieval enhancement language model |
-
2022
- 2022-07-08 CN CN202210800916.XA patent/CN115168579A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116562284A (en) * | 2023-04-14 | 2023-08-08 | 湖北经济学院 | Government affair text automatic allocation model training method and device |
CN116562284B (en) * | 2023-04-14 | 2024-01-26 | 湖北经济学院 | Government affair text automatic allocation model training method and device |
CN116660992A (en) * | 2023-06-05 | 2023-08-29 | 北京石油化工学院 | Seismic signal processing method based on multi-feature fusion |
CN116660992B (en) * | 2023-06-05 | 2024-03-05 | 北京石油化工学院 | Seismic signal processing method based on multi-feature fusion |
CN117573869A (en) * | 2023-11-20 | 2024-02-20 | 中国电子科技集团公司第十五研究所 | Network connection resource key element extraction method |
CN118277538A (en) * | 2024-06-04 | 2024-07-02 | 杭州昊清科技有限公司 | Legal intelligent question-answering method based on retrieval enhancement language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111753060B (en) | Information retrieval method, apparatus, device and computer readable storage medium | |
Diallo et al. | Deep embedding clustering based on contractive autoencoder | |
CN110147457B (en) | Image-text matching method, device, storage medium and equipment | |
CN112711953B (en) | Text multi-label classification method and system based on attention mechanism and GCN | |
CN109783666B (en) | Image scene graph generation method based on iterative refinement | |
CN115168579A (en) | Text classification method based on multi-head attention mechanism and two-dimensional convolution operation | |
CN114547298B (en) | Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism | |
CN112732921B (en) | False user comment detection method and system | |
Lin et al. | Deep structured scene parsing by learning with image descriptions | |
CN113535953B (en) | Meta learning-based few-sample classification method | |
CN114528898A (en) | Scene graph modification based on natural language commands | |
Puscasiu et al. | Automated image captioning | |
CN118411572B (en) | Small sample image classification method and system based on multi-mode multi-level feature aggregation | |
CN112036189A (en) | Method and system for recognizing gold semantic | |
CN117171393A (en) | Multi-mode retrieval-oriented self-adaptive semi-pairing inquiry hash method | |
Tiwari et al. | Learning semantic image attributes using image recognition and knowledge graph embeddings | |
Kumar et al. | Self-attention enhanced recurrent neural networks for sentence classification | |
Mohammadi et al. | A comprehensive survey on multi-hop machine reading comprehension approaches | |
Gong et al. | Document-Level Joint Biomedical Event Extraction Model Using Hypergraph Convolutional Networks | |
Habib et al. | GAC-Text-to-Image Synthesis with Generative Models using Attention Mechanisms with Contrastive Learning | |
Borkar et al. | An application of generative adversarial network in natural language generation | |
CN111259650A (en) | Text automatic generation method based on class mark sequence generation type countermeasure model | |
Yang et al. | Network Configuration Entity Extraction Method Based on Transformer with Multi-Head Attention Mechanism. | |
Cheng | Improving Natural Language Understanding via Contrastive Learning Methods | |
CN117556275B (en) | Correlation model data processing method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |