CN114780720A

CN114780720A - Text entity relation classification method based on small sample learning

Info

Publication number: CN114780720A
Application number: CN202210318340.3A
Authority: CN
Inventors: 戚荣志; 高逸飞; 李水艳; 赵小涵; 陈子琦; 黄倩
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-07-22

Abstract

The invention discloses a text entity relation classification method based on small sample learning, which comprises the following steps of: 1) extracting semantic features of instance vectors in the data set by using a convolutional neural network as an instance encoder; 2) under a small sample learning scene, a prototype-level attention mechanism module is designed, each sample is endowed with a weight, and a prototype of each relation is represented in a weighted summation mode; 3) and in a small sample learning scene, replacing a new measurement function. Extracting a characteristic coefficient in a support set vector by using convolution operation through a distance level attention mechanism module, and calculating the distance between each relation prototype and a query instance in the support set by using the product of a Manhattan distance formula and the characteristic coefficient as a new measurement function; 4) and (5) realizing small sample relation classification by utilizing a softmax function.

Description

Text entity relation classification method based on small sample learning

Technical Field

The invention relates to a text entity relation classification method based on small sample learning, and belongs to the technical field of text data identification.

Background

Relationship classification is receiving more and more attention as one of the important subtasks of knowledge extraction. Oriented to unstructured text data, the task of relational classification is to extract semantic relationships between two or more entities from the text. In the current relation classification problem, most mature technologies achieve excellent experimental results by improving the traditional neural network model (such as a recurrent neural network, a convolutional neural network, and the like). However, the data sets selected in the experiment are simple short sentences of which the categories are predefined, and the sample distribution of each relationship is relatively uniform. In practical applications, the data size is small, and the distribution of samples is not uniform.

The advent of remote surveillance methods has provided a solution for small scale datasets to obtain large scale training data by aligning plain text with existing knowledge maps. The basic assumption of the remote surveillance method is that if two entities have some relationship in the knowledge-graph, a sentence containing both entities expresses this relationship. But the assumption is too strong so that the remote supervision data set contains a large number of falsely labeled samples. Meanwhile, the long tail distribution characteristics of the relation and the entity pair still exist in a real scene, and available samples are few.

In fact, the human can learn knowledge quickly through fewer samples, and the ability of 'holding one thing over the other' is achieved, and the method is also suitable for deep learning. Researchers provide a small sample learning task, and the problems of less data sets and uneven distribution are effectively solved by designing a mode of combining a small sample learning mechanism and relationship classification. The text entity relation classification method based on small sample learning mainly has the function that two attention mechanism modules are introduced to realize relation classification on the basis of a network framework of small sample learning. Currently, for each relationship existing in the support set, the small sample relationship classification usually adopts the method of averaging the example vectors to find the relationship prototype. Since the amount of data is small in a scene of small sample learning, when one example is far away from other examples in the mapping space, the average calculated prototype may be greatly deviated. Once a large amount of noise data exists, the final relation classification effect is greatly influenced; meanwhile, for a relational feature vector, only a part of dimensions have obvious distinguishing effect on the final classification result, and once the problem of feature sparsity exists in the instance vector extracted from the support set, the final classification result is subjected to larger deviation. Therefore, aiming at the two problems of relation classification in a small sample learning scene, how to improve the prototype representation of the relation example, how to solve the characteristic sparsity problem of the relation characteristic vector, and meeting the representation requirement under the condition of uneven sample distribution are important problems to be solved.

Disclosure of Invention

The invention aims to: aiming at the problems and the defects in the prior art, the invention provides a text entity relation classification method based on small sample learning, wherein the small sample learning is a training method specially proposed aiming at scenes which have scarce data and are difficult to meet the requirements of model training. Aiming at the problem that the prototype vector representation has errors in the scene that a data set is scarce, the problem is solved by using a prototype-level attention mechanism module; aiming at the characteristic that the measurement function can not highlight important dimensionality in the vector, the problem is solved by utilizing a distance level attention mechanism module and replacing an original distance formula.

The technical scheme is as follows: a text entity relation classification method based on small sample learning comprises the following steps:

step 1: and (3) adopting a CNN network as an example encoder to encode the support set statement and the query set statement in the given data set, and converting the support set statement and the query set statement into low-dimensional example vectors so as to obtain the entity pair characteristics of the extracted corpus.

And 2, step: in the small sample learning scenario, originally, for each relationship existing in the support set, a relationship prototype is usually obtained by directly averaging the example vectors. Now, a prototype-level attention mechanism module is utilized to give a weight to each instance, so that a weighted prototype vector is obtained;

and 3, step 3: splicing the support set examples obtained by the encoding in the step 1 into a vector matrix, and extracting semantic features of important dimensions in the support set examples through a distance level attention mechanism module so as to obtain the weight beta of the distance level attention mechanism.

And 4, step 4: and (3) adopting a Manhattan distance formula as a new calculation formula, and multiplying the distance level attention mechanism weight beta obtained in the step (3) by the distance formula to obtain a new distance formula as a measurement function. By using the formula, the distance measurement can be performed on the query examples in the query set and the prototype obtained in step 2.

And 5: and (4) comparing the distance between the query example and the prototype according to the distance formula obtained in the step 4. And (5) carrying out relation classification by utilizing a softmax function.

In the step 1, a CNN network is used as an example encoder to encode a support set statement and a query set statement in a given data set, and the method includes the following steps:

1-1, converting the corpus in the input data set into a low-dimensional word vector form by means of Glove word embedding and entity position embedding. Glove word embedding (WF) is to convert input corpus into co-occurrence matrix with dimension d_wRepresents; the sentence obtained after the Glove word embedding is represented as a vector list (x)₀,x₁,x₂,...,x_i) Wherein x is_iIndicating the i-th word embedding, and the entity position embedding (PF) is to calculate each word embedding vector x in the vector list_m(m∈[0,i]) Distance to two head and tail entities in a sentence, dimension 2 x d_pAnd (4) showing. Finally, Glove word embedding and entity position embedding are combined and are expressed as { e₁,...,e_n}＝{[WF₁；PF₁]...,[WF_n；PF_n]Thus forming an embedded vector sequence of sentences.

1-2, further processing the final sentence embedding vector sequence obtained in the step 1-1 by using a CNN network, extracting semantic features in the sentence embedding vector sequence, and specifically dividing the sentence embedding vector sequence into a convolutional layer and a maximum pooling layer. The convolutional layer extracts features of a sentence embedding vector sequence by using a convolutional sliding window with the length of m, and the obtained vector sequence after feature extraction is processed by a ReLU activation function to obtain sentence hiding embedding; and the maximum pooling layer processes the sentence hiding embedding obtained by the convolutional layer. Finally, an example vector of the whole sentence is obtained.

The step 2 of giving a weight to each instance by using a prototype-level attention mechanism module to obtain a weighted prototype vector includes the following steps:

2-1 Using the Gaussian function as the activation function, the weight γ assigned to each sample instance is found_ijFinally, weighted summation is carried out on each relation example by using the weight, and the prototype-level relation vector representation c of the whole relation is obtained_iRespectively expressed as:

wherein q is_jRepresenting query set instances, M representing relationship types present in the data set, K representing the number of instances under each relationship type, x_ijDenotes the jth support set instance, σ, under the relation i_iThe parameter values are expressed as gaussian functions.

The step 3 of extracting semantic features of important dimensions in the support set instance through the distance level attention mechanism module, thereby obtaining the weight β of the distance level attention mechanism, includes the following steps:

3-1 dividing the supporting centralized instance sentences according to the relation, and dividing each relation r_iK support set instances [ x ] of (1)_i1,x_i2,...,x_iK]Processing the data by the example encoder in the step 1, and splicing the data into a K x d_hVector matrix of 1, where K denotes the number of instances per relationship class, d_hRepresenting a hidden layer unit.

3-2, the vector matrix passes through a module formed by interweaving three convolution layers and three ReLU function layers, semantic features with dimension values not being 0 in the support set example are extracted, and the dimension is changed into 1 x d _h1, to obtain the distance-level attention weight β. The more useful the corresponding feature dimension, the higher the corresponding beta value. Since some dimension values of the instance vector are 0, important dimensions other than 0 need to be highlighted to exert their effects in relation classification.

The metric formula established in step 4 comprises the following steps:

4-1, a Manhattan distance formula is selected as a distance formula, so that errors in calculation can be eliminated, the operation speed is greatly improved, distance measurement can be performed under high-dimensional data, and a good classification effect is guaranteed.

4-2 multiplying the distance-level attention mechanism weight β obtained in step 3-2 by the distance formula selected in step 4-1 to obtain a distance function d (x, y) which is expressed as a new metric function:

wherein n represents a dimension, x_iAnd y_iRespectively representing the values of x and y in the i dimension

The relation classification by using the softmax function established in the step 5 comprises the following steps:

5-1 prototype-level relationship vector representation c obtained according to step 2-2_iAnd the distance function d (x,y), computing an instance in the query set and c)_iA distance therebetween

Representing the vector obtained after the query set instance x passes through the instance coding layer.

5-2 for the instances in the query set, it is determined which relationship in the relationship set R specifically belongs to. The concrete expression is as follows:

wherein the conditional probability

For query set instance x at relationship r_iThe probability of the following.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the text entity relationship classification method based on small sample learning as described above when executing the computer program.

A computer readable storage medium storing a computer program for executing the text entity relationship classification method based on small sample learning as described above.

Compared with the prior art, the invention has the advantages that:

(1) the relation classification method adopts a prototype-level attention mechanism to obtain a prototype, and can eliminate the influence of extreme data on the representation of the whole relation prototype. Meanwhile, a Gaussian function is adopted as an activation function, and the method is different from a common activation function and is better at processing data with smaller difference; in addition, the Gaussian function curve is matched with the long tail distribution characteristics of the small sample data, and the method is more suitable for the small sample relation classification task;

(2) the relation classification method adopts a distance level attention mechanism to obtain the characteristic coefficient, can emphasize important dimensions in the relation vector, plays a role in final relation classification and solves the characteristic sparse problem;

(3) the relation classification method adopts a Manhattan distance formula as a measurement formula. The method is different from the conventional Euclidean distance formula, and solves the problem that the Euclidean distance fails under high-dimensional data. Meanwhile, errors in calculation are effectively eliminated, and the operation speed is greatly improved.

Drawings

FIG. 1 is a general framework diagram of a method of an embodiment of the invention;

FIG. 2 is a flow diagram of a prototype level attention mechanism module of an embodiment of the present invention;

FIG. 3 is a diagram of a distance level attention mechanism module framework according to an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

As shown in fig. 1, the text entity relationship classification method based on small sample learning includes the following steps:

step 1, a data set is given, wherein the data set comprises support set statements and query set statements, the support set statements and the query set statements are coded by adopting a CNN (convolutional neural network) as an example coder and converted into low-dimensional example vectors, and entity pair features of the extracted corpora are obtained. The specific process is as follows:

1-1, selecting a FewRel data set which is used more in the field of small sample relation classification;

1-2 predefined sets of relationship types for the FewRel dataset. The following are some entity relationship types set in the training set in this embodiment, where the specified entity relationship types are all classified according to the types specified in the original data set:

numbering	Entity relationship types	Number of	Entity relationship types
				1	P931	6	P6
2	P4552	7	P27
				3	P140	8	P449
4	P1923	9	P1435
				5	P150	10	P175

1-3 embedding by using GloveAnd converting the input corpus into a low-dimensional word vector form by an embedding mode of the input entity and the entity position. Glove word embedding is to convert input linguistic data into a co-occurrence matrix by using dimension d_wRepresenting; the sentence obtained after the Glove word embedding is represented as a vector list (x)₀,x₁,x₂,...,x_i) Wherein x is_iThe method represents the i-th word embedding, and the entity position embedding is to calculate the distance between each word embedding vector and two head and tail entities in a sentence by using the dimension 2 x d_pAnd (4) showing. Finally combining Glove word embedding with entity location embedding achieves a final embedded representation for each word.

For example, for The input sentences "The name of East middlands air objects at one point change to Nottingham," The head entity is "East middlands air objects" and The tail entity is "Nottingham," and for a certain word "one" therein, The relative distance to The head entity is 5 and The relative distance to The tail entity is-4, so The "one" position embedding is represented as [5, -4]. After the whole sentence passes through the embedding layer, the word embedding of each word is represented by WF, the dimension is set to be 50, the position embedding is represented by PF, the dimension is set to be 5, and finally the embedding vector sequence of the whole sentence is represented as { e }₁,...,e_n}＝{[WF₁；PF₁]...,[WF_n；PF_n]}。

1-4, using CNN network to further process the sentence embedded vector sequence, extracting semantic features, which can be divided into convolutional layer and maximum pooling layer. The convolution layer extracts features by utilizing a convolution sliding window with the length of m and passes an obtained vector sequence through a ReLU activation function; and the maximum pooling layer processes the sentence hiding embedding obtained by the convolutional layer. Finally, an example vector of the whole sentence is obtained.

For example, for the final sentence found in 1-3, the sequence of embedding vectors e₁,...,e_nAnd setting the length of a sliding window as m, extracting semantic features in the sliding window by using convolution operation, and expressing the semantic features as follows:

embedding [ h ] hidden sentences obtained by maximum pooling operation on convolutional layers₁,h₂,...h_n]The treatment was performed as follows:

[x]_i＝maxpooling{[h₁]_i,...,[h_n]_i}

step 2, solving each relation prototype by utilizing a prototype-level attention mechanism module, which specifically comprises the following steps:

2-1 introduce a prototype-level attention mechanism method to solve the relationship prototypes for each relationship. The specific calculation flow is shown in fig. 2. Firstly, the input statement is judged whether to be a support set or not through a judgment process. Then combining the divided query set examples with all support set examples under a relation, and using a Gaussian function as an activation function to obtain the weight gamma given to each support set example_ij. Finally, summing each relation example to obtain a prototype representation c of the whole relation_iRespectively expressed as:

for example, in this embodiment, an original network is selected as a frame, and a relation original of a support set vector in the input frame can be obtained by a formula in 2-1.

Step 3, splicing the support set instances into a vector matrix, extracting important dimensional characteristics by using a distance level attention mechanism module, and improving a measurement formula, wherein the method specifically comprises the following steps of;

3-1 support set instances are partitioned by relationship and each relationship r is first partitioned as shown in the framework of FIG. 3_iK support set instances [ x ] of (1)_i1,x_i2,...,x_iK]Spliced into a K x d_hAnd (4) extracting features of the vector matrix of 1 through convolution modules formed by combining three convolution layers and three ReLU layers respectively, and finally obtaining a weight coefficient beta of the vector matrix. Wherein the dimension of the input vector is sequentially changed to K x d_h*32，K*d_h*64，1*d_h*1。

3-2, multiplying the weight coefficient obtained in the step 3-1 by using a Manhattan distance formula to obtain a final measurement formula, wherein the final measurement formula is expressed as follows:

and 4, calculating the query examples and the prototype obtained by the prototype-level attention mechanism module in the step 2, measuring the distance between the query examples and the relation prototype by using the distance formula obtained in the step 3, and finally finishing the relation classification of the query examples by applying a softmax function.

For example, for an example sentence in example sentences 1-3, in the prototype network framework, after comparing it with all the relationship prototypes in the support set, it is found that the value obtained in the calculation of the relationship P931 is the largest, so in example sentences 1-3, the entity classifies the relationship between < East Middle Airport, Nottingham > as P931.

According to the embodiment, the invention realizes the text entity relation classification method based on small sample learning. And setting the small sample relation classification scene according to requirements. The invention adopts a double attention mechanism module to improve the classification task of the small sample relation. The prototype-level attention mechanism module represents a relationship prototype by giving different weights to relationship examples, and eliminates the influence of individual extreme examples on prototype representation; the distance level attention mechanism is used for highlighting the dimension which has larger influence on relation classification in the feature space, and introduces a Manhattan formula as a new distance function to realize higher-performance relation classification.

It will be apparent to those skilled in the art that the steps of the text entity relationship classification method based on small sample learning according to the embodiment of the present invention described above can be implemented by a general-purpose computing device, they can be centralized on a single computing device or distributed on a network formed by a plurality of computing devices, and they can alternatively be implemented by program code executable by a computing device, so that they can be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described can be executed in a sequence different from that described herein, or they can be separately manufactured into various integrated circuit modules, or a plurality of modules or steps in them can be manufactured into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Claims

1. A text entity relation classification method based on small sample learning is characterized by comprising the following steps:

step 1: adopting a CNN network as an example encoder to encode support set statements and query set statements in a given data set, converting the support set statements and the query set statements into low-dimensional example vectors, and obtaining entity pair characteristics of extracted corpora;

step 2: in the small sample learning scene, a prototype-level attention mechanism module is utilized to give weight to each example, and therefore a weighted prototype vector is obtained;

and 3, step 3: splicing the support set examples obtained by the coding in the step 1 into a vector matrix, and extracting semantic features of important dimensions in the support set examples through a distance level attention mechanism module so as to obtain a weight beta of the distance level attention mechanism;

and 4, step 4: multiplying the distance level attention mechanism weight beta obtained in the step 3 by a Manhattan distance formula to obtain a new distance formula as a measurement function; utilizing the measurement function to measure the distance between the query examples in the query set and the prototype vector obtained in the step 2;

and 5: comparing the distance between the query instance and the prototype vector according to the distance formula obtained in the step 4; and (5) carrying out relation classification by utilizing a softmax function.

2. The method for classifying text entity relationships based on small sample learning according to claim 1, wherein in the step 1, a CNN network is used as an example encoder to encode support set sentences and query set sentences in a given data set, and the method comprises the following steps:

1-1, converting the linguistic data in the input data set into a low-dimensional word vector form by means of Glove word embedding and entity position embedding; the sentence obtained after the Glove word embedding is represented as a vector list (x)₀,x₁,x₂,...,x_i) Wherein x is_iIndicating the i-th word embedding, and the entity position embedding is to calculate each word embedding vector x in the vector list_m(m∈[0,i]) Distance between two head and tail entities in a sentence, Glove word embedding and entity position embedding are combined and expressed as { e }₁,...,e_n}＝{[WF₁；PF₁]...,[WF_n；PF_n]Thus forming an embedded vector sequence of sentences.

1-2, further processing the final sentence embedding vector sequence obtained in the step 1-1 by using a CNN network, extracting semantic features in the sentence embedding vector sequence, and specifically classifying the semantic features into a convolutional layer and a maximum pooling layer. The convolutional layer is used for extracting features of a sentence embedding vector sequence by using a convolutional sliding window with the length of m, and processing the obtained vector sequence after the features are extracted through a ReLU activation function to obtain sentence hiding embedding; the maximum pooling layer processes hidden embedding of sentences obtained by the convolutional layers; finally, an example vector of the whole sentence is obtained.

3. The method for classifying textual entities based on small sample learning according to claim 1, wherein said step 2 of weighting each instance by using a prototype-level attention mechanism module to obtain a weighted prototype vector comprises the steps of:

2-1 uses a gaussian function as the activation function,the weight given to each sample instance is found_ijFinally, weighted summation is carried out on each relation example by using the weight, and the prototype-level relation vector representation c of the whole relation is obtained_iRespectively expressed as:

4. The method for classifying text entity relations based on small sample learning according to claim 1, wherein the step 3 of extracting semantic features of important dimensions in support set instances through a distance level attention mechanism module so as to obtain the weight β of the distance level attention mechanism comprises the following steps:

3-1 dividing the example sentences in the support set according to the relation and dividing each relation r_iK support set instances [ x ] of (1)_i1,x_i2,...,x_iK]Processing the data by the example encoder in the step 1, and splicing the data into a K x d_hVector matrix of 1, where K denotes the number of instances per relationship class, d_hRepresenting a hidden layer unit;

3-2, the vector matrix passes through a module which is formed by interweaving three convolution layers and three ReLU function layers, semantic features with dimension values not being 0 in the support set example are extracted, and dimension is carried outDegree becomes 1 x d_h1, to obtain the distance-level attention mechanism weight β.

5. The method for classifying text entity relationships based on small sample learning according to claim 1, wherein the metric formula established in step 4 comprises the following steps:

4-1, selecting a Manhattan distance formula as a distance formula;

4-2 multiplying the distance level attention mechanism weight β by the distance formula selected in 4-1 to obtain a distance function d (x, y) as a new metric function, expressed as:

wherein n represents a dimension, x_iAnd y_iRespectively representing the values of x and y in the i dimension.

6. The method for classifying text entities based on small sample learning according to claim 1, wherein the relationship classification using softmax function established in step 5 comprises the following steps:

5-1 representation of c from prototype-level relationship vectors_iAnd a distance function d (x, y) for computing an instance in the query set and c_iA distance therebetween

Representing a vector obtained after an example x of the query set passes through an example coding layer;

5-2, judging which relationship in the relationship set R specifically belongs to for the examples in the query set; the concrete expression is as follows:

wherein the conditional probability

For query set instance x at relationship r_iThe probability of the following.

7. A computer device, characterized by: the computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the text entity relation classification method based on small sample learning according to any one of claims 1-6 when executing the computer program.

8. A computer-readable storage medium characterized by: the computer readable storage medium stores a computer program for executing the text entity relationship classification method based on small sample learning according to any one of claims 1 to 6.