CN113128203A

CN113128203A - Attention mechanism-based relationship extraction method, system, equipment and storage medium

Info

Publication number: CN113128203A
Application number: CN202110340791.2A
Authority: CN
Inventors: 赵青; 李建强; 徐春; 张晓桐
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-07-16

Abstract

The invention provides a relation extraction method, a system, equipment and a storage medium based on an attention mechanism, wherein the method comprises the following steps: preprocessing a text to be processed to form a first sentence; identifying a first statement, and performing semantic completion on the first statement to form a second statement; and extracting the feature vector in the second sentence, and inputting the feature vector into a relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained by training based on a semantic completion sentence and a normal sentence in a training corpus. Preprocessing a text to be processed, performing semantic completion on the text, and obtaining a relation label type through a relation extraction model; the method and the device realize the extraction of the relation of the sentences with incomplete semantic meanings, improve the extraction efficiency and the extraction accuracy, and avoid the problem that manual labeling of the sentences wastes time and is expensive.

Description

Attention mechanism-based relationship extraction method, system, equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a relation extraction method, a system, equipment and a storage medium based on an attention mechanism.

Background

The information extraction mainly extracts specific fact information, namely entities from texts, and the entity relationship extraction is used as a subtask of information extraction, and the main purpose of the information extraction is to extract structured relationship information from unstructured texts. Conventional relationship extraction methods are generally based on supervised learning, semi-supervised learning and unsupervised learning. The relation extraction method based on supervised learning needs a corpus which is fully manually labeled as a training set to train a relation extraction model, so that a large amount of manpower and time are consumed, and the prediction capability of new entity relations which are not in the training set is poor. The relation extraction method based on semi-supervised learning extracts entity relations by using a partially labeled corpus in an iterative training mode. Although this approach reduces the cost of manual annotation to some extent, it still requires partial annotation data. The relation extraction method based on unsupervised learning does not need a corpus which is manually marked, and the relation is automatically classified by a clustering method, so that suboptimal results can be obtained by the method.

The remote supervision combines the advantages of semi-supervision and unsupervised methods, entity relations are automatically marked by aligning the unmarked linguistic data with entities in the knowledge base, so that the performance of entity relation extraction is greatly improved when entity information is extracted from the knowledge base, labor cost is reduced, and the noise problem of the remote supervision method is solved by calculating the similarity of relation words between entity pairs in the domain ontology and dependency words between entity pairs in the unmarked text.

Because there are short sentences with incomplete semantics such as subjects, predicates or objects, such as electronic medical records in the medical field, domain knowledge picture construction, emotion analysis, retrieval or prediction, for example, a normal sentence is expressed as: chest radiograph shows right pleural lesion and calcification of aortic nodules, wherein the entities are: chest radiograph, right pleural lesion, calcification of aortic nodules, the relationship is: (chest radiograph, show, right pleural lesion), (chest radiograph, show, aortic nodule calcification), and semantically missing phrases are shown as: considering bronchitis, the entities are: bronchitis. Therefore, the short sentences with missing semantics can not reflect entity relations, so that the accuracy rate of relation extraction is poor.

Disclosure of Invention

The invention provides a relation extraction method, a system, equipment and a storage medium based on an attention mechanism, which are used for solving the problem of poor relation extraction accuracy of short sentences with incomplete semantics such as subject, predicate or object which are lack of remote supervision and improving the relation extraction efficiency.

The invention provides a relation extraction method based on an attention mechanism, which comprises the following steps: preprocessing a text to be processed to form a first sentence; identifying a first statement, and performing semantic completion on the first statement to form a second statement; and extracting a feature vector in the second sentence, and inputting the feature vector into a relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained based on a semantic completion sentence and a normal sentence in a training corpus.

According to the attention mechanism-based relation extraction method provided by the invention, a text to be processed is preprocessed to form a first sentence, and the method comprises the following steps: combining the un-labeled linguistic data in the text to be processed with a knowledge base, and aligning the un-labeled linguistic data with entities in the knowledge base to automatically label entity relations; and cutting the first sentence into Chinese character strings according to the entities, punctuation marks, numbers and space marks contained in the combined knowledge base, and removing stop words to form the first sentence.

According to the attention mechanism-based relationship extraction method provided by the invention, a first statement is identified, and the first statement is subjected to semantic completion to form a second statement, and the method comprises the following steps: identifying a syntax structure of a first sentence and a first entity in the first sentence through parsing; learning the context characteristics of a first entity in a first sentence in the first sentence to obtain the missing semantic information of the first sentence; and performing semantic completion processing on the first statement according to the missing semantic information to form a second statement.

According to the relation extraction method based on the attention mechanism provided by the invention, the corresponding semantic completion processing is carried out on the first statement according to the type of the missing semantic information to form a second statement, and the method comprises the following steps: if the missing semantic information is subject or object missing, finding out an entity containing the missing semantic information of the first sentence as a target entity, and splicing the target entity and the first sentence to form a second sentence; if the missing semantic information is subject and predicate or predicate and object missing, finding out a normal sentence containing the missing semantic information of the first sentence, and forming a second sentence by the first sentence and the normal sentence.

According to the attention mechanism-based relationship extraction method provided by the invention, an entity containing the missing semantic information of the first statement is found out as a target entity, and the target entity and the first statement are spliced to form a second statement, which comprises the following steps: extracting entities with different types from the first entity from the context sentences of the normal sentences in which the first entity is located in the knowledge base as candidate entities; extracting a trigger word closest to the first entity from the context sentence based on the label type to serve as a target trigger word, and extracting a feasibility relation type between the target trigger word and the first entity by adopting a word attention mechanism; extracting a candidate entity with the highest similarity from the candidate entities as a target entity based on a remote supervision method and the feasibility relation type; splicing the target entity with the first statement to form a second statement;

finding out a normal sentence containing the missing semantic information of the first sentence, and splicing the first sentence and the normal sentence into a sentence, wherein the method comprises the following steps: extracting a first entity contained in the first statement; finding out the context feature closest to the first entity in the context sentence, calculating the similarity between the context feature and the first entity, and selecting the normal sentence with the context feature with the highest similarity as the target normal sentence; and splicing the first statement and the target normal statement end to form a second statement.

According to the attention mechanism-based relationship extraction method provided by the invention, after the first sentence is identified by analyzing the syntactic structure of the first sentence, the method further comprises the following steps: labeling a first entity by adopting a part-of-speech tag mechanism; after performing semantic completion processing on the first sentence according to the missing semantic information, the method further comprises the following steps: and checking the entity label marked on the first entity.

According to the attention mechanism-based relationship extraction method provided by the invention, the relationship extraction model comprises a convolution layer, a segmentation maximum pooling layer, a sentence attention layer and a classification layer, wherein: the convolution layer is used for performing convolution operation on the input feature vector; the segmented maximum pooling layer divides the feature vector subjected to the convolution operation into three parts according to the position of the entity pair in the second statement, performs segmented pooling on each part, respectively calculates the maximum value of each part, and combines the maximum values of the three parts to output; a sentence attention layer, which obtains sentence characteristics according to the output of the segmented maximum pooling layer and the semantic relation between the words and the entity pairs in the second sentence; and the classification layer is used for inputting the sentence characteristics and calculating according to the sentence characteristics so as to output the relation label type.

According to the attention mechanism-based relationship extraction method provided by the invention, the feature vector comprises entity features, word features and position features, and the extracting of the feature vector in the second sentence comprises the following steps: extracting entity features by adding self features and type features of a second entity in the second sentence; converting the non-entity words in the second sentence into word vectors by adopting a word2vec model to extract word features; and extracting the position feature by calculating the relative distance between each word vector in the second sentence and any two second entity features.

According to the attention mechanism-based relationship extraction method provided by the invention, the training of the relationship extraction model comprises the following steps: constructing a training corpus based on a remote supervision method, inputting sentences in the training corpus into a relation extraction model as training sentences, wherein the training corpus comprises normal sentences and/or semantic completion sentences; the relation extraction model is trained according to the input training sentences to output and predict the relation label types; obtaining a loss function according to the predicted relation label type and the real relation label type of the training sentence, and finishing the training if the loss function is stable; otherwise, continuing to train the relation extraction model.

The invention also provides a relation extraction system based on the attention mechanism, which is realized based on any one of the relation extraction methods based on the attention mechanism, and the system comprises: the data preprocessing module is used for preprocessing the text to be processed to form a first sentence; the short sentence processing module is used for identifying the first sentence and carrying out semantic completion on the first sentence to form a second sentence; and the feature extraction module is used for extracting a feature vector relation extraction model module in the second statement and outputting the label type according to the feature vector extracted by the feature extraction module.

The present invention also provides an electronic device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the attention-based mechanism relationship extraction method as described in any one of the above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the attention-based mechanism relationship extraction method as described in any of the above.

According to the relation extraction method and system based on the attention mechanism, the text to be processed is preprocessed to form the first statement, so that the entity relation is automatically marked by aligning the unmarked corpus with the entity in the knowledge base, and the marking efficiency is improved; then, the first sentence is identified and subjected to semantic completion to form a second sentence, so that the relation of the second sentence can be extracted subsequently, and the problems of wrong relation extraction caused by directly extracting the relation of short sentences with incomplete semantics and time and labor waste caused by manually labeling the sentences are avoided; and extracting the feature vector in the second sentence and inputting the feature vector into the trained relation extraction model to obtain the relation label type, thereby realizing the relation extraction of the sentence with incomplete semantic meaning, improving the extraction efficiency and the extraction accuracy and reducing the influence of noise in the relation extraction process.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a relationship extraction method based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a phrase semantic completion phase according to an embodiment of the present invention;

FIG. 3 is a flow chart of a relationship extraction phase according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a relationship extraction system based on an attention mechanism according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Reference numerals:

1: a data preprocessing module; 2: a phrase processing module; 3: a feature extraction module;

4: a relation extraction model module; 51: a processor; 52: a communication interface;

53: a memory; 54: a communication bus.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 illustrates a flow chart of a relationship extraction method based on an attention mechanism, as shown in fig. 1, including:

s01: preprocessing a text to be processed to form a first sentence;

s02: identifying a first statement, and performing semantic completion on the first statement to form a second statement;

s03: and extracting the feature vector in the second sentence, and inputting the feature vector into a relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained by training based on a semantic completion sentence and a normal sentence in a training corpus.

It should be noted that S0N in this specification does not represent the order of the relationship extraction method, and each step is described in detail below with reference to fig. 2 to 3.

And step S01, preprocessing the text to be processed to form a first sentence.

In this embodiment, preprocessing a text to be processed to form a first sentence includes: combining the un-labeled linguistic data in the text to be processed with the knowledge base, and aligning the un-labeled linguistic data with the entities in the knowledge base to automatically label the entity relationship; dividing the text to be processed into Chinese character strings according to the entities, punctuation marks, numbers and space marks contained in the combined knowledge base, and removing stop words to form a first sentence S₁＝{W₁…e_i…e_o…W_mA first statement includes a first entity, where { W }₁…W_mIs a first statement S₁Non-entity word of (1), e_iAnd e_oIs S₁The first entity in (1). It should be noted that the text to be processed may be an electronic medical record.

Step S02: and identifying the first sentence, and performing semantic completion on the first sentence to form a second sentence.

In this embodiment, recognizing the first sentence and performing semantic completion on the first sentence to form the second sentence includes: identifying a syntax structure of a first sentence and a first entity in the first sentence through parsing; learning the context characteristics of the first entity in the first sentence to obtain the missing semantic information of the first sentence; and performing semantic completion processing on the first statement according to the missing semantic information to form a second statement.

Specifically, identifying a first sentence by parsing a syntax structure of the first sentence includes: the syntax structure of the first sentence is analyzed using the syntax parsing tool HIT LTP. In addition, extracting the dependency path characteristics among the first entities in the first statement by adopting a skip-gram model so as to enhance the expression of word vectors, and then adopting a word2vec model to carry out the first statement S₁The non-entity words in the word space are converted into d-dimensional word vectors to extract word features, and each word vector in the word features is labeled. For example, the first sentence S₁＝{W₁…e_i…e_o…W_mAdopting a word2vec model to convert a first statement S into a second statement S₁The non-entity words in (1) are converted into word vectors to extract word features and form a first feature sentence s₁＝{w₁…g_i…g_o…w_m}，w_iIs a first characteristic statement S₁Word feature of (1), g_iAnd g_oIs a first feature sentence s₁Of the first entity. In addition, since the text to be processed is an electronic medical record, the labeled tag type can include diseases, symptoms, and the like.

Due to the writing habit of Chinese, the missing semantic information in the first sentence can be usually extracted from the adjacent context features, so that the missing semantic information of the first sentence can be obtained by learning the context features of each first entity in the first sentence. In this embodiment, a long-and-short-term memory network (BILSTM) may be employed to learn the context characteristics of each of the first entities in the first sentence. For example, the first feature sentence s₁＝{w₁…g_i…g_o…w_mAnd the first characteristic sentence s₁The corresponding hidden unit is { h }₁,h₂,…,h_mIn which the hidden element dimensions are the same as the word vector, in combination with the forward directionAnd the reverse long-time memory network learns the context characteristic expression of the first entity in the first characteristic statement as follows:

wherein:

a hidden unit representing a forward long and short term memory network,

representing hidden units of a reverse long-and-short-term memory network, t representing a word vector w_iIn the state at the time t of the instant t,

is a vector join operation.

Then, according to the context characteristics, missing semantic information is obtained, which mainly includes two types: (1) absence of subject or object, (2) absence of subject and predicate, or absence of predicate and object. It should be noted that the absence of a subject or object means that the first sentence lacks a stem component, usually comprising one or more entities of the same entity type, and the remaining two stem components are each not limited to one, for example, the short sentence may lack a subject but have two predicates or objects. The absence of a subject and a predicate, or a predicate and an object, means that the first sentence contains only one stem component, which typically contains only one or more entities and is of the same entity type.

After the missing semantic information is obtained, performing semantic completion processing on the first sentence according to the missing semantic information, if the missing semantic information is subject or object missing, finding out an entity containing the missing semantic information of the first sentence as a target entity, and splicing the target entity and the first sentence to form a second sentence; if the missing semantic information is subject and predicate or predicate and object missing, finding out a normal sentence containing the missing semantic information of the first sentence, and splicing the first sentence and the normal sentence to form a second sentence.

When the first sentence is subject or object missing, finding out an entity containing the missing semantic information of the first sentence as a target entity, and splicing the target entity and the first sentence to form a second sentence, wherein the second sentence comprises:

first, an entity of a type different from the first entity is extracted from the context statement as a candidate entity, such as context statement H₁And H₂Containing entity { e₁,e₂,e₃,e₄And has a corresponding entity type { a, b, c, d }, a first entity e being in the first sentence characteristic_nCorresponding entity type c, then the candidate entity set T ═ e₁,e₂,e₄}. Secondly, extracting a trigger word closest to the first entity from the context sentence based on the pos tag to serve as a target trigger word, and extracting the feasibility relationship type between the trigger word and the first entity by adopting a word attention mechanism.

Extracting a trigger word closest to the first entity from the context sentence based on the pos tag as a target trigger word, wherein the method comprises the following steps: extracting a trigger from the context sentence as a candidate trigger based on the tag type of the first entity, in this embodiment, extracting a predicate as a candidate trigger based on the pos tag, that is, { v [ [ v ] m ]₁,v₂,…,v_nV ═ v; then, the relative distance between each candidate trigger word and the first entity is calculated, and the candidate trigger word with the closest distance is selected as the target trigger word

Wherein, w_iRepresenting word vector features, w_i ^vposRepresenting a characteristic of the predicate part of speech, w_i ^pRepresenting the relative distance of the candidate trigger from the first entity. It should be noted that there is only one relative distance due to the absence of a trunk portion.

It should be noted that before extracting the feasible relationship type between the trigger word and the first entity by using the word attention mechanism, each relationship instance in the knowledge base needs to be represented in a triple form, for example, if { W }₁,…,W_m}∈K，m＝{e₁,r,e₂Where K denotes the knowledge base, { W }₁,…W_mDenotes the examples contained in K, e₁And e₂Respectively represent a first entity, e₁And e₂Is a first entity pair, r_iDenotes e₁And e₂(iii) relationship of (c) { r }₁,r₂,…,r_nR, r represents a relationship type.

Because the target trigger word and the first entity can coexist in a plurality of semantic environments, and a plurality of different relationship types may exist between the target trigger word and the first entity, a word attention mechanism is adopted to extract a possible relationship type, namely a feasibility relationship type, between the target trigger word and the first entity. It should be noted that the feasibility relationship type may be a fine-grained relationship type and/or a coarse-grained relationship type between any two entities, and since the description information of the fine-grained relationship type between any two entities is usually composed of a plurality of words, in this embodiment, the feasibility relationship type only includes the coarse-grained relationship type. Extracting the feasibility relationship type between the target trigger word and the first entity by adopting a word attention mechanism, wherein the feasibility relationship type comprises the following steps: triggering words v according to targets_iObtaining attention weight u from its relation type r in context statement_iThen according to the attention weight u_iCalculating the weight vector of each target trigger word, and finally calculating to obtain a feasibility relationship type according to the weight vector of each target trigger word, wherein the formula is as follows:

u_i＝W_a(tan[v_i；r])+b_a

f_i＝α_iv_i

wherein: u. of_iIndicates the attention weight, [ v ]_i；r]Representing a target trigger word v_iCorrelation with relation type r, W_aAnd b_aRepresenting a weight matrix and an offset term, α ═ α₁,α₂,…,α_nDenotes the weight vector of each target trigger word in the sentence, f_iRepresenting a feasibility relationship type.

And then, extracting the candidate entity with the highest similarity from the candidate entities as a target entity based on a remote supervision method and the feasibility relation type. In this embodiment, the remote supervision assumes that: in the knowledge base, a named entity e is known₁And relation type r with entity e₁Occurrence of the relation r is e₂The triplet being denoted as e₁+r≈e₂Based on the feasibility relation type f_iSelecting and e among candidate entities₂And taking the entity with the highest similarity as a target entity. For example, if g is in the knowledge base_n+f_i≈g_mLearned based on modified assumptions of remote supervision and feasibility relationship types, (g)_n+f_i)＝(g_p+f_i) Then calculate the candidate entities and g_mAnd selecting the candidate entity with the highest similarity as the target entity, wherein g_n、g_mAnd g_pAll are entity features, which are calculated by adding the self features and the type features of the corresponding entities, i.e. g_n＝e_n+e_ntyple，g_m＝e_m+e_mtyple,g_p＝e_p+e_ptyple。

It should be noted that, since the accuracy of labeling each first entity in the first sentence is not completely correct, an erroneous first entity label may cause a noise problem, and in addition, in the subsequent extraction process of the target entity, the accuracy of the target entity also depends on the label information of the first entity to a great extent, and the erroneous entity type may cause the completion of erroneous semantic information, so that after labeling each first entity in the first sentence, before extracting the target entity, the first entity label is also corrected.

In this embodiment, an entity tag type verification method and bayesian probability are adopted to further verify the accuracy of the first entity tag on the basis of identifying the first entity. Entity tag type school partyThe method comprises the following steps: input a first entity e in a first sentence_pE, target trigger word v_iE is v; the correct entity type tag is output. Specifically, a first entity e in an input first sentence_pE, target trigger word v_iE.v, judging whether the first entity in the first statement is a single-type entity or a multi-type entity according to the entity identification structure. If the first entity e_pIf the entity is a single-type entity, the first entity is correctly labeled, and the proofreading is finished;

if the first entity e_pIf the entity is a multi-type entity, calculating the probability:

if probability e_ptype1>e_ptype2Then, determine e_pIf e is marked as type1_pIs not marked as type1, the first entity e is modified_pIs labeled type 2.

And finally, splicing the extracted target entity with the first statement to form a second statement.

Referring to fig. 2, when the first sentence is a sentence with a missing subject and a missing predicate, or a missing predicate and an missing object, finding out a normal sentence containing missing semantic information of the first sentence, and splicing the first sentence and the normal sentence into one sentence specifically includes:

firstly, a first entity contained in a first statement is extracted, and context features closest to the first entity in the context statements are found out through a bidirectional long-time memory network. Such as the first sentence feature s₁＝{w₁…g_i…g_o…w_mThe first entity g in_oThen its adjacent above feature is g_iAdjacent following feature is g_kIf g is_iAnd g_kTo g_oIs different in distance, the extraction includesAnd taking the sentence with the characteristic closest to the target normal sentence.

Secondly, the similarity between the context feature and the first entity is calculated, and the normal sentence with the context feature with the highest similarity is selected as the target normal sentence. In this example, if g_iAnd g_kTo g_oIs the same, then g is calculated_oAnd g_i、g_oAnd g_kSimilarity with g_oAnd taking the sentence with the high similarity characteristic as a target normal sentence. It should be noted that we are more concerned with the similarity of entity types, since we are to extract a normal sentence with contextual meaning to the first sentence, rather than semantically similar entities. In addition, the cosine distance similarity threshold value adopted in the experiment is set to be 0.5-0.7, and can be generally 0.62.

And finally, splicing the first statement and the target normal statement end to form a second statement.

Step S03: extracting the feature vector in the second sentence, and inputting the feature vector into the relationship extraction model to obtain the relationship label type output by the relationship extraction model, wherein the relationship extraction model is obtained by training based on the semantic completion sentence and the normal sentence in the training corpus, referring to fig. 3.

In this embodiment, the extracting the feature vector in the second sentence includes: extracting entity features, such as g, by adding self and type features of a second entity in a second sentence_p＝e_p+e_ptyple(ii) a Converting non-entity words in the second sentence into d-dimensional word vectors by adopting a word2vec model to extract word features, if a second sentence S exists₂＝{W₁…e_i…e_o…W_mIn which { W }₁…W_mIs S₂The word in (e)_iAnd e_oIs the second sentence S₂The second entity in the second sentence adopts a word2vec model to convert non-entity words in the second sentence into word vectors to extract word features, forms a second feature sentence which is marked as s₂＝{w₁…g_i…g_o…w_m}; by passingCalculating the relative distance between each word vector in the second feature sentence and any two second entity features to extract the position features, such as the second feature sentence s₂＝{w₁…g_i…g_o…w_mEach word vector w_iTwo relative distances, denoted as w, from any two second physical features gi and go_i ^p(i＝1,2)。

In this embodiment, a relationship extraction model is trained based on a segmented convolutional neural network (PCNN model), and the extracted feature vectors are input into the trained relationship extraction model to output a relationship label type. The relation extraction model comprises a convolution layer, a segmentation maximum pooling layer, a sentence attention layer and a classification layer, wherein:

and the convolution layer is used for performing convolution operation on the input feature vector. Specifically, the second sentence S₂＝{W₁…e_i…e_o…W_mH, its corresponding second characteristic statement s₂＝{w₁…g_i…g_o…w_m}，w_iIs the second sentence S₂Middle ith word W_iWord vector of g_iIs the second sentence S₂Of the ith second entity e_iSecond physical characteristic of, w_i∈R^d，R^dRepresenting the dimension of the word vector, h representing the length of the convolution kernel, the feature map c after the convolution operation_iIs denoted by c_i＝f(V·w_i:i+h-1+b)，C＝[c₁,…c_i,…,c_n]^T∈R^m(n-h+1)M denotes a series of convolution kernels, m>1, where f (-) denotes a non-linear activation function, in this embodiment a commonly used ReLU function, w, can be used_i:i+h-1Indicating that the vector splicing operation is performed from the word vector i to i + h-1, and b indicates the bias term. To prevent overfitting, a dropout strategy is added during the convolution process and zero padding is used to maintain the validity of the second statement.

Segmenting the maximum pooling layer, and convolving the feature graph c according to the positions of any two second entity features in the second sentence, namely the second entity pairs_iIs divided into three parts [ c_i1,c_i2,c_i3]Calculating and extracting the maximum value p of each part_ij＝max(c_ij) I is more than or equal to 1 and less than or equal to n to obtain the most important feature vector in the second statement, and then the feature vectors of the three parts are combined to be used as the output of the segmented maximum pooling layer, and the output is expressed as p E R³ⁿ，p∈(p₁…p_n). The variables for each of the three parts are pooled separately and taken to a maximum value, and the feature of this segmented maximum pooling takes into account the location information of the second entity pair for subsequent better representation of the entities.

The sentence attention layer obtains sentence characteristics according to the output of the segmentation maximum pooling layer and the semantic relation between the words and the entity pairs in the second sentence, and comprises the following steps:

first, the semantic distance and the position distance of the word vector and any two second entity features in the second sentence are calculated.

The semantic distance calculation method of the word vector and the second entity pair comprises the following steps:

the method for calculating the position distance between the word vector and the second entity pair comprises the following steps:

wherein g is_pAnd g_qRepresenting any two second entity features in the second sentence, w_iRepresenting word vectors, Pg, in the second sentence_pAnd Pg_qIndicating the location of the second entity pair. Need toIt should be noted that, in a sentence, the closer a word is to the semantics and position of the entity pair, the higher its importance coefficient is. Therefore, in the embodiment, important semantic information related to the relational expression can be acquired by calculating the semantic distance and the position distance.

Secondly, according to the semantic distance and the position distance, each word vector w is respectively calculated_iCorresponding weight matrix alpha_i，α_i∈[0,1]And is and

α_ithe calculation formula of (a) is as follows:

then, a weight matrix α based on each word vector_iObtaining a weight-based word vector WE corresponding to each word vector_iAnd a weight-based position vector PE_iFrom the weighted word vector WE_iAnd a weight-based position vector PE_iRespectively expressed as:

based on the corresponding weight-based word vector WE of all the word vectors_iAnd a weight-based position vector PE_iA sentence vector is calculated, represented as follows:

then obtaining a bias term b according to the output p of the maximum layer of the segmented pooling_s，b_sTan h (p), where tan h represents the activation function and p is the output of the piecewise max pooling layer.

Finally, the bias term b is extracted according to the sentence vector s' and the PCCN model_sCalculating and obtaining sentence characteristics s and outputting, wherein the sentence characteristics s are expressed as: s ═ W_ss’+b_sWherein W is_sAnd s' is a second characteristic sentence input into the relation extraction model.

The classification layer is used for inputting sentence characteristics s, calculating according to the sentence characteristics s and outputting a relationship tag type, and specifically comprises the following steps: placing the sentence characteristics s into a softmax classification layer of the PCNN model to predict the relationship label for calculation so as to output the predicted relationship label type

The calculation method is as follows:

wherein,

representing the type of the predicted relation label, y representing the relation label, n representing the type i of the relation type label, belonging to [1, n ∈]W is a prediction weight matrix, s is a sentence characteristic, and b is a prediction bias term.

The invention also provides a training method of the relation extraction model, which is applied to the relation extraction method based on the attention mechanism, and comprises the following steps:

firstly, a training corpus is constructed based on a remote supervision method, and the training corpus comprises normal sentences and/or semantic completion sentences. It should be noted that the semantic completion sentence is a normal sentence formed after the semantic completion of the sentence with missing semantics.

Secondly, the relation extraction model is trained according to the input training sentences to output the predicted relation label types. It should be noted that the specific implementation of the training sentence input into the relationship extraction model may refer to the specific implementation of the second sentence input into the relationship extraction model, which is not described herein again.

Finally, obtaining a loss function according to the predicted relation label type and the real relation label type of the training statement, and finishing the training if the loss function is stable; otherwise, continuing to train the relation extraction model. It should be noted that after the relationship extraction model outputs the predicted tag type, the classification layer calculates a loss function according to the real relationship tag type corresponding to the training sentence and the predicted tag type output by the relationship extraction model, and the loss function is calculated as follows:

wherein n represents the type i E [1, n ] of the relation type label]，y_iA type of the real-world relationship label is represented,

type y of tag representing a true relationship_iAnd predicted relationship tag types

The cross-entropy between the two is,

the loss function is represented.

Loss function

The smaller the model accuracy. When calculating the loss function

Then, based on the calculated loss function

Judging whether to finish the training, if the loss function is kept stable, finishing the training; otherwise, continuing to train the relation extraction model.

In summary, the embodiment of the present invention pre-processes the text to be processed to form the first sentence, so as to align the unmarked corpus with the entities in the knowledge base to automatically mark the entity relationship, thereby improving the marking efficiency; then, the first sentence is identified and subjected to semantic completion to form a second sentence, so that the relation of the second sentence can be extracted subsequently, and the problems of wrong relation extraction caused by directly extracting the relation of short sentences with incomplete semantics and time and labor waste caused by manually labeling the sentences are avoided; and extracting the feature vector in the second sentence and inputting the feature vector into the trained relation extraction model to obtain the relation label type, thereby realizing the relation extraction of the sentence with incomplete meaning, improving the extraction efficiency and the extraction accuracy, reducing the influence of noise in the relation extraction process and expanding the application field of remote supervision.

Fig. 4 illustrates an attention-based relationship extraction system, which is implemented based on the above-mentioned attention-based relationship extraction method, and the system includes:

the data preprocessing module 1 is used for preprocessing a text to be processed to form a first sentence;

the phrase processing module 2 is used for identifying the first sentence and carrying out semantic completion on the first sentence to form a second sentence;

the feature extraction module 3 is used for extracting a feature vector in the second statement;

and the relation extraction model module 4 outputs the relation label type according to the feature vector extracted by the feature extraction module.

In this embodiment, a text to be processed is preprocessed by the data preprocessing module 1, where the preprocessing includes: and segmenting the text to be processed into Chinese character strings according to the unmarked corpus and the entities, punctuation marks, numbers and space marks contained in the knowledge base, and removing stop words to form a first sentence.

And the phrase processing module 2 comprises an identification unit and a semantic completion unit, wherein the identification unit is used for analyzing the syntactic structure of the first sentence and learning the context characteristics of the first entity in the first sentence so as to obtain the missing semantic information of the first sentence, and the semantic completion unit performs semantic completion processing on the first sentence according to the missing semantic information to form a second sentence.

It should be noted that after the second sentence is formed, the second sentence and the normal sentence are constructed to form a training corpus. The feature extraction module 3 extracts training sentence feature vectors from the training corpus, wherein the feature vectors include entity features, word features and position features.

Subsequently, the relation extraction model module 4 includes a relation convolution layer, a segmentation maximum pooling layer, a sentence attention layer and a classification layer, wherein the convolution layer is used for performing convolution operation on the input feature vector; the segmented maximum pooling layer divides the feature map after convolution into three parts according to the position of a second entity pair in a second statement, calculates and extracts the maximum value of each part to obtain the most important feature vector in the second statement, and then combines the feature vectors of the three parts to serve as the output of the segmented maximum pooling layer; the sentence attention layer calculates and extracts important semantic information related to the relational expression according to the semantics and the position distance between the words and the entity pairs in the second sentence, and obtains sentence characteristics according to the important semantic information; and the classification layer is used for inputting sentence characteristics and calculating according to the sentence characteristics so as to output the relation label type. It should be noted that, in the stage of training the relationship extraction model module, the classification layer is further configured to calculate the output relationship label type and the real label type to obtain a loss function, and determine whether to end the training according to whether the loss function is stable. For the specific method, reference may be made to the foregoing method embodiments, which are not described herein again.

In summary, in the embodiment of the present invention, the data preprocessing module preprocesses the text to be processed to form the first sentence, so as to align the unmarked corpus with the entities in the knowledge base to automatically mark the entity relationship, thereby improving the marking efficiency; then, the first sentence is identified through the short sentence processing module, and the semantic completion is carried out on the first sentence to form a second sentence, so that the second sentence is convenient for carrying out the relation extraction subsequently, and the problems of time and labor waste caused by the fact that the relation extraction is directly carried out on short sentences with incomplete semantics and the wrong relation extraction is caused and the sentences are manually marked are avoided; and extracting the feature vectors from the second sentence and the normal sentence through the feature extraction module, and inputting the feature vectors into the relation extraction model module to output the relation label type, so that the relation extraction of the short sentence with missing semantics is facilitated, the extraction efficiency and the extraction accuracy are improved, and the influence of noise in the relation extraction process is reduced.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)51, a communication Interface (communication Interface)52, a memory (memory)53 and a communication bus 54, wherein the processor 51, the communication Interface 52 and the memory 53 complete communication with each other through the communication bus 54. The processor 51 may call logic instructions in the memory 53 to perform an attention mechanism based relationship extraction method comprising: preprocessing a text to be processed to form a first sentence; identifying a first statement, and performing semantic completion on the first statement to form a second statement; and extracting a feature vector in the second sentence, and inputting the feature vector into a pre-trained relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained based on the semantic completion sentence and the normal sentence in the training corpus.

In addition, the logic instructions in the memory 53 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the attention-based relationship extraction method provided by the above methods, the method including: preprocessing a text to be processed to form a first sentence; identifying a first statement, and performing semantic completion on the first statement to form a second statement; and extracting a feature vector in the second sentence, and inputting the feature vector into a pre-trained relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained based on the semantic completion sentence and the normal sentence in the training corpus.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the attention-based mechanism-based relationship extraction methods provided above, the method comprising: preprocessing a text to be processed to form a first sentence; identifying a first statement, and performing semantic completion on the first statement to form a second statement; and extracting a feature vector in the second sentence, and inputting the feature vector into a pre-trained relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained based on the semantic completion sentence and the normal sentence in the training corpus.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A relationship extraction method based on an attention mechanism is characterized by comprising the following steps:

preprocessing a text to be processed to form a first sentence;

identifying the first statement, and performing semantic completion on the first statement to form a second statement;

and extracting a feature vector in the second sentence, and inputting the feature vector into a relation extraction model to obtain a relation label type output by the relation extraction model, wherein the relation extraction model is obtained based on a semantic completion sentence and a normal sentence in a training corpus.

2. The attention mechanism-based relationship extraction method of claim 1, wherein the identifying the first sentence and performing semantic completion on the first sentence to form a second sentence comprises:

identifying a syntax structure of the first sentence and a first entity in the first sentence by parsing;

learning the context feature of the first entity in the first sentence to obtain the missing semantic information of the first sentence;

and performing semantic completion processing on the first statement according to the missing semantic information to form a second statement.

3. The attention mechanism-based relationship extraction method according to claim 2, wherein performing corresponding semantic completion processing on the first sentence according to the missing semantic information to form a second sentence comprises:

if the missing semantic information is subject or object missing, finding out an entity containing the missing semantic information of the first sentence as a target entity, and splicing the target entity and the first sentence to form a second sentence;

if the missing semantic information is that the subject and the predicate are missing or the predicate and the object are missing, finding out a normal sentence containing the missing semantic information of the first sentence, and splicing the first sentence and the normal sentence to form a second sentence.

4. The method for extracting relationship based on attention mechanism according to claim 3, wherein the finding out an entity containing the missing semantic information of the first sentence as a target entity and splicing the target entity and the first sentence to form a second sentence comprises:

extracting entities with different types from the first entity from the context sentences of the normal sentences in which the first entity is located in the knowledge base as candidate entities;

extracting a trigger word closest to the first entity from the context sentence based on the label type to serve as a target trigger word, and extracting a feasibility relation type between the target trigger word and the first entity by adopting a word attention mechanism;

extracting a candidate entity with the highest similarity from the candidate entities as a target entity based on a remote supervision method and the feasibility relation type;

splicing the target entity with the first statement to form a second statement;

the finding out the normal sentence containing the missing semantic information of the first sentence, and splicing the first sentence and the normal sentence into one sentence includes:

extracting a first entity contained in the first statement;

finding out the context feature closest to the first entity in the context sentence, calculating the similarity between the context feature and the first entity, and selecting the normal sentence with the context feature with the highest similarity as a target normal sentence;

and splicing the first statement and the target normal statement end to form a second statement.

5. The attention mechanism-based relationship extraction method of claim 2, wherein after identifying the first sentence by parsing a syntactic structure of the first sentence, further comprising: labeling the first entity by adopting a part-of-speech tag mechanism;

after performing semantic completion processing on the first sentence according to the missing semantic information, the method further includes: and checking the entity label marked by the first entity.

6. The attention mechanism-based relationship extraction method according to claim 1, wherein the relationship extraction model comprises a convolutional layer, a segment max pooling layer, a sentence attention layer, and a classification layer, wherein:

the convolution layer is used for performing convolution operation on the input feature vector;

the segmented maximum pooling layer divides the feature vector subjected to the convolution operation into three parts according to the position of the entity pair in the second statement, performs segmented pooling on each part, respectively calculates the maximum value of each part, and combines the maximum values of the three parts to output;

a sentence attention layer, which obtains sentence characteristics according to the output of the segmented maximum pooling layer and the semantic relation between the words and the entity pairs in the second sentence;

and the classification layer is used for inputting the sentence characteristics and calculating according to the sentence characteristics so as to output the relation label type.

7. The attention mechanism-based relationship extraction method of claim 1, wherein training the relationship extraction model comprises:

constructing a training corpus based on a remote supervision method, inputting sentences in the training corpus into the relation extraction model as training sentences, wherein the training corpus comprises normal sentences and/or semantic completion sentences;

the relation extraction model is trained according to input training sentences to output and predict the relation label type;

obtaining a loss function according to the predicted relation label type and the real relation label type of the training statement, and finishing the training if the loss function is stable; otherwise, continuing to train the relation extraction model.

8. An attention-based relationship extraction system, which is implemented based on the attention-based relationship extraction method of any one of claims 1 to 7, and comprises:

the data preprocessing module is used for preprocessing the text to be processed to form a first sentence;

the phrase processing module is used for identifying the first sentence and carrying out semantic completion on the first sentence to form a second sentence;

the feature extraction module is used for extracting a feature vector in the second statement;

and the relation extraction model module outputs the label type according to the feature vector extracted by the feature extraction module.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the attention-based relationship extraction method according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the attention-based relationship extraction method according to any one of claims 1 to 7.