CN115114918B

CN115114918B - Entity relation extraction method, entity relation extraction device, data labeling system and storage medium

Info

Publication number: CN115114918B
Application number: CN202110286687.XA
Authority: CN
Inventors: 贾丹; 项超; 王学敏; 孟维业
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2024-09-06
Anticipated expiration: 2041-03-17
Also published as: CN115114918A

Abstract

The disclosure provides a method, a device, a data labeling system and a storage medium for extracting entity relations, wherein the method comprises the following steps: processing the word vector and the initial relation subgraph by using a reinforcement learning model to acquire entity relation extraction information among each word in the word vector; generating a relationship sub-graph based on the entity relationship extraction information, and processing the relationship sub-graph by using a relationship graph processing model to generate relationship sub-graph characteristic information corresponding to the relationship sub-graph; based on the word vector and the characteristic information of the relational subgraph, generating state information of the reinforcement learning model, and processing the state information by using the reinforcement learning model to obtain new entity relation extraction information. The method, the device and the storage medium convert the problem of generating the entity relationship of the NLP into the problem of generating the entity relationship graph, can efficiently characterize the entity relationship, reduce the labor cost and improve the labeling efficiency.

Description

Entity relation extraction method, entity relation extraction device, data labeling system and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for extracting an entity relationship, a data labeling system, and a storage medium.

Background

The data labeling platform is an important ring in an automatic machine learning assembly line, and the data labeling effect directly influences the subsequent model training and the predicted quality. The existing labeling platform comprises a large number of repeated labeling, the intelligent degree is low, and a large amount of labor time and energy are wasted, so that the construction of the efficient and self-learning labeling platform is particularly important. The entity relation extraction is used as a basic task of NLP (Natural Language Processing ), and has wide application in the fields of knowledge graph construction, automatic question-answering, intelligent search engines and the like. Entity relationship labeling is one type of NLP data labeling, but due to the complexity of entity relationships in text, higher labor costs will be consumed in labeling work. Therefore, a new technical solution for entity relation extraction is needed.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method, an apparatus, a data labeling system and a storage medium for extracting entity relationships.

According to a first aspect of the present disclosure, there is provided a method for extracting an entity relationship, including: acquiring a text to be annotated, and generating a word vector based on the text to be annotated; processing the word vector and the initial relation subgraph by using a reinforcement learning model to acquire entity relation extraction information among each word in the word vector; generating a relationship sub-graph based on the entity relationship extraction information, and processing the relationship sub-graph by using a relationship graph processing model to generate relationship sub-graph characteristic information corresponding to the relationship sub-graph; generating state information of the reinforcement learning model based on the word vector and the relation sub-graph characteristic information, and processing the state information by using the reinforcement learning model to obtain new entity relation extraction information.

Optionally, the generating the word vector based on the text to be annotated includes: processing the text to be annotated by using a word vector generation model to generate the word vector; wherein the word vector generation model comprises: word2Vec model.

Optionally, constructing the relationship graph processing model for identifying the relationship graph; and training the relationship graph processing model according to the relationship sub-graph sample set.

Optionally, the relationship graph processing model includes: a graph is rolled up of neural network models.

Optionally, the reinforcement learning model is a reinforcement learning model based on Q learning; the Q learning network of the reinforcement learning model uses a bi-directional LSTM model.

Optionally, inputting the state information into the reinforcement learning model, and obtaining an action corresponding to the state information; wherein the acts include: adding relationship entities, adding association relationships among existing relationship entities, and not adding entities and association relationships.

Optionally, extracting information according to the target entity relation corresponding to the text to be marked and the new entity relation, and calculating by using a preset punishment function to obtain a punishment processing result; wherein, the rewarding and punishing processing result comprises: adding and subtracting the points without changing the points;

and carrying out optimization processing on the reinforcement learning model based on the reward and punishment processing result.

According to a second aspect of the present disclosure, there is provided an entity relationship extraction apparatus, comprising: the word vector generation module is used for acquiring a text to be annotated and generating a word vector based on the text to be annotated; the recognition processing module is used for processing the word vector and the initial relation subgraph by using a reinforcement learning model and acquiring entity relation extraction information among words in the word vector; the relationship sub-graph processing module is used for generating a relationship sub-graph based on the entity relationship extraction information, processing the relationship sub-graph by using a relationship graph processing model and generating relationship sub-graph characteristic information corresponding to the relationship sub-graph; the recognition processing module is further configured to generate state information of the reinforcement learning model based on the word vector and the relationship sub-graph feature information, and process the state information by using the reinforcement learning model to obtain new entity relationship extraction information.

Optionally, the word vector generation module is configured to process the text to be annotated by using a word vector generation model to generate the word vector; wherein the word vector generation model comprises: word2Vec model.

Optionally, a model training module is configured to construct the relationship graph processing model for identifying the relationship subgraph; and training the relationship graph processing model according to the relationship sub-graph sample set.

Optionally, the recognition processing module is configured to input the state information into the reinforcement learning model, and acquire an action corresponding to the state information; wherein the acts include: adding relationship entities, adding association relationships among existing relationship entities, and not adding entities and association relationships.

Optionally, the model optimization module is used for extracting information according to the target entity relation corresponding to the text to be marked and the new entity relation, and calculating by using a preset reward and punishment function to obtain a reward and punishment processing result; wherein, the rewarding and punishing processing result comprises: adding and subtracting the points without changing the points; and carrying out optimization processing on the reinforcement learning model based on the reward and punishment processing result.

According to a third aspect of the present disclosure, there is provided an entity relationship extraction apparatus, comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method as described above based on instructions stored in the memory.

According to a fourth aspect of the present disclosure, there is provided a data annotation system comprising: the entity relationship extraction means as described above.

According to a fifth aspect of the present disclosure, there is provided a computer readable storage medium storing computer instructions for execution by a processor of a method as described above.

According to the entity relation extraction method, device, data annotation system and storage medium, the entity relation generation problem of NLP is converted into the generation problem of the entity relation graph, the entity relation can be effectively represented and learned from manual annotation behaviors, the labor cost is reduced, the annotation efficiency is improved, and repeated annotation and similar annotation are avoided.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the description of the prior art, it being obvious that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow diagram of one embodiment of a method of entity relationship extraction according to the present disclosure;

FIG. 2A is a schematic diagram of model operation according to one embodiment of the entity relationship extraction method of the present disclosure; FIG. 2B is a schematic diagram of a relational graph;

FIG. 3 is a block diagram of one embodiment of an entity relationship extraction apparatus according to the present disclosure

FIG. 4 is a block diagram of another embodiment of an entity relationship extraction apparatus according to the present disclosure;

fig. 5 is a block diagram of yet another embodiment of an entity relationship extraction apparatus according to the present disclosure.

Detailed Description

The present disclosure now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the disclosure are shown. The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

The entity relationship annotation is one of NLP data annotation, and the entity relationship extraction method is provided, and converts the entity relationship generation problem of NLP into the generation problem of an entity relationship graph, so that the entity relationship can be represented in a natural and efficient mode; the model is learned from the manual labeling behaviors, the interaction process of obtaining manual feedback rewards and punishments is converted into the reinforcement learning process, and the accurate relationship labeling behaviors are obtained by maximizing the obtained rewards.

FIG. 1 is a flow diagram of one embodiment of a method for entity relationship extraction according to the present disclosure, as shown in FIG. 1:

and step 101, acquiring a text to be annotated, and generating a word vector based on the text to be annotated.

And 102, processing the word vector and the initial relation subgraph by using the reinforcement learning model to acquire entity relation extraction information among each word in the word vector.

In one embodiment, reinforcement learning (Reinforcement Learning, RL), also known as re-learning, evaluation learning, or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and solve the problem of agents (agents) through learning strategies to maximize returns or achieve specific goals during interactions with an environment.

The training method can be used for training the reinforcement learning model in advance, the trained reinforcement learning model is used for processing the word vector and the initial relation subgraph, entity relation extraction information among words in the word vector is obtained, the entity relation extraction information comprises association relations among the words, and the association relations can be "creation", "generation" and the like. The initial relational subgraph may be a preset relational subgraph or a relational subgraph with empty content, or word vectors are processed using a reinforcement learning model,

And 103, generating a relationship sub-graph based on the entity relationship extraction information, and processing the relationship sub-graph by using a relationship graph processing model to generate relationship sub-graph characteristic information corresponding to the relationship sub-graph.

In one embodiment, the relationship subgraph may be directly output using a reinforcement learning model, or may be generated based on terms and associations between terms using other tools.

And 104, generating state information of the reinforcement learning model based on the word vector and the characteristic information of the relation subgraph, and processing the state information by using the reinforcement learning model to acquire new entity relation extraction information.

In one embodiment, state information for the reinforcement learning model may be generated based on the word vector and the relational subgraph feature information using existing methods. The new entity relationship extraction information includes the obtained association relationship between the words again, and an entity relationship graph can be generated based on the words and the association relationship between the words.

According to the entity relation extraction method in the embodiment, the entity relation generation problem of the NLP is converted into the entity relation graph generation problem, so that the entity relation can be represented in a natural and efficient mode and learned from manual labeling behaviors, the labor cost is greatly reduced, the labeling efficiency is improved, and repeated labeling and similar labeling are avoided; the first time graph generation and reinforcement learning are combined for NLP entity relationship extraction.

In one embodiment, the text to be annotated is processed using a Word vector generation model, which includes a Word2Vec model, and the like, to generate Word vectors. And constructing a relationship graph processing model for identifying the relationship subgraphs, and training the relationship graph processing model according to the relationship subgraph sample set, wherein the relationship graph processing model comprises a convolutional neural network (Convolutional Neural Networks, CNN) model and the like. The set of relational subgraph samples can be constructed by an existing method, and the relational graph processing model is trained according to the set of relational subgraph samples based on an existing model training method.

The reinforcement learning model is a reinforcement learning model based on Q learning, and a two-way LSTM (Long Short-Term Memory) model is used for a Q learning network of the reinforcement learning model. Inputting the state information into a reinforcement learning model to obtain actions corresponding to the state information; wherein the actions include: adding relationship entities, adding association relationships among existing relationship entities, and not adding entities and association relationships.

Extracting information according to the target entity relation corresponding to the text to be marked and new entity relation extracting information, and calculating by using a preset reward and punishment function to obtain a reward and punishment processing result; the reward and punishment processing results comprise: adding and subtracting the points without changing the points; and (3) carrying out optimization treatment on the reinforcement learning model based on the reward and punishment treatment result.

In the entity relation extraction method disclosed by the invention, the state update of reinforcement learning not only uses the output of LSTM, but also increases the feature of extracting the current relation subgraph by using graph convolution; the action space of reinforcement learning is not a simple classification problem any more, but it is determined whether to add a node representing an entity to the relationship graph or add an edge representing a relationship to an existing node according to the situation, and the entity label and the relationship extraction are combined into one step for generating the relationship graph, so that an additional relationship extraction model is not required to be introduced.

In one embodiment, as shown in fig. 2A, an NLP entity relationship labeling scheme based on interactive learning of a graph generation strategy is adopted to convert a relationship labeling problem into a relationship graph generation problem, obtain efficient relationship characterization, and enable a model to learn in the interactive process of manual labeling and manual feedback.

For example, the sentence to be annotated is "Microsoft by Bell, getzt and Paul, allen sponsored in 1975". The entity relation extraction problem is converted into a process of constructing a relation diagram step by using reinforcement learning through the DQN model and the graph convolution GCN model, and the generated relation subgraph is shown in FIG. 2B. The relational subgraph is updated into the state S of reinforcement learning by extracting features of the graph convolution GCN model as a part of the environmental observation. The graph convolution GCN model is a graph generation model pre-trained on an entity relation database, and only relation subgraph characteristic information prediction is carried out when subgraph characteristics are extracted.

As shown in fig. 2A, the state section S of reinforcement learning includes a representation of a graph convolution of word vectors of words and a current relational subgraph, using a bi-directional LSTM model in the Q-learning section, with the state S of each input and the hidden layer of bi-directional LSTM as inputs to the Q-learning network, with three types of actions for each state: adding nodes of the graph, adding edges to existing subgraphs, next (adding neither entities nor edges), and Reward obtained by Agent are also of three types: add fraction (10, 30), subtract fraction (-10), zero fraction (0).

In one embodiment, as shown in fig. 3, the present disclosure provides an entity relationship extraction apparatus 30, including a word vector generation module 31, an identification processing module 32, and a relationship sub-graph processing module 33. The word vector generation module 31 acquires text to be annotated, and generates a word vector based on the text to be annotated. The recognition processing module 32 processes the word vectors and the initial relational subgraphs using the reinforcement learning model to obtain entity relationship extraction information between each word in the word vectors.

The relationship sub-graph processing module 33 generates a relationship sub-graph based on the entity relationship extraction information, processes the relationship sub-graph using the relationship graph processing model, and generates relationship sub-graph feature information corresponding to the relationship sub-graph. The recognition processing module 32 generates state information of the reinforcement learning model based on the word vector and the relationship sub-graph feature information, and processes the state information using the reinforcement learning model to obtain new entity relationship extraction information.

In one embodiment, the word vector generation module 31 processes the text to be annotated using a word vector generation model, which includes: word2Vec model, etc. As shown in fig. 4, the entity relationship extraction apparatus 30 further includes a model training module 34 and a model optimizing module 35. The model training module 34 constructs a relationship graph processing model for identifying the relationship subgraphs, trains the relationship graph processing model according to the relationship subgraph sample set, and includes a graph convolutional neural network model and the like.

The recognition processing module 32 inputs the state information into the reinforcement learning model, and obtains actions corresponding to the state information, including adding relationship entities, adding association relationships between existing relationship entities, not adding entities, and association relationships, etc. The model optimization module 35 calculates according to the target entity relation extraction information and the new entity relation extraction information corresponding to the text to be marked by using a preset reward and punishment function to obtain a reward and punishment processing result, wherein the reward and punishment processing result comprises: add points, subtract points, not change points, etc. The model optimization module 35 performs optimization processing on the reinforcement learning model based on the result of the reward and punishment processing.

Fig. 5 is a block diagram of yet another embodiment of an entity relationship extraction apparatus according to the present disclosure. As shown in fig. 5, the apparatus may include a memory 51, a processor 52, a communication interface 53, and a bus 54. The memory 51 is used for storing instructions, the processor 52 is coupled to the memory 51, and the processor 52 is configured to implement the above-described entity relationship extraction method based on the instructions stored in the memory 51.

The memory 51 may be a high-speed RAM memory, a nonvolatile memory (non-volatile memory), or the like, or the memory 51 may be a memory array. The memory 51 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. Processor 52 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement the entity-relationship extraction method of the present disclosure.

In one embodiment, the present disclosure provides a data annotation system comprising an entity relationship extraction apparatus as in any one of the embodiments above.

In one embodiment, the present disclosure provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the entity-relationship extraction method of any of the embodiments above.

The entity relation extraction method, the device, the data labeling system and the storage medium provided in the embodiment convert the entity relation generation problem of the NLP into the generation problem of the entity relation graph, so that the entity relation can be represented in a more natural and efficient mode and learned from manual labeling behaviors, the labor cost is greatly reduced, the labeling efficiency is improved, and repeated labeling and similar labeling are avoided.

The methods and systems of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method for entity relationship extraction, comprising:

Acquiring a text to be annotated, and generating a word vector based on the text to be annotated;

processing the word vector and the initial relation subgraph by using a reinforcement learning model to acquire entity relation extraction information among each word in the word vector;

Generating a relationship sub-graph based on the entity relationship extraction information, and processing the relationship sub-graph by using a relationship graph processing model to generate relationship sub-graph characteristic information corresponding to the relationship sub-graph;

generating state information of the reinforcement learning model based on the word vector and the relation sub-graph characteristic information, and processing the state information by using the reinforcement learning model to obtain new entity relation extraction information;

Wherein the reinforcement learning model is a reinforcement learning model based on Q learning; inputting the state information into the reinforcement learning model to acquire actions corresponding to the state information; the actions include: adding relationship entities, adding association relationships among existing relationship entities, and not adding entities and association relationships.

2. The method of claim 1, the generating a word vector based on the text to be annotated comprises:

Processing the text to be annotated by using a word vector generation model to generate the word vector;

Wherein the word vector generation model comprises: word2Vec model.

3. The method of claim 2, further comprising:

constructing the relationship graph processing model for identifying the relationship graph;

and training the relationship graph processing model according to the relationship sub-graph sample set.

4. The method of claim 3, wherein,

The relationship graph processing model comprises: a graph is rolled up of neural network models.

5. The method of claim 1, wherein,

The Q learning network of the reinforcement learning model uses a bi-directional LSTM model.

6. The method of claim 1, further comprising:

According to the target entity relation extraction information corresponding to the text to be marked and the new entity relation extraction information, calculating by using a preset reward and punishment function, and obtaining a reward and punishment processing result; wherein, the rewarding and punishing processing result comprises: adding and subtracting the points without changing the points;

7. An entity relationship extraction apparatus, comprising:

The word vector generation module is used for acquiring a text to be annotated and generating a word vector based on the text to be annotated;

The recognition processing module is used for processing the word vector and the initial relation subgraph by using a reinforcement learning model and acquiring entity relation extraction information among words in the word vector;

The relationship sub-graph processing module is used for generating a relationship sub-graph based on the entity relationship extraction information, processing the relationship sub-graph by using a relationship graph processing model and generating relationship sub-graph characteristic information corresponding to the relationship sub-graph;

The recognition processing module is further used for generating state information of the reinforcement learning model based on the word vector and the relation sub-graph characteristic information, and processing the state information by using the reinforcement learning model to obtain new entity relation extraction information;

wherein the reinforcement learning model is a reinforcement learning model based on Q learning; the recognition processing module is used for inputting the state information into the reinforcement learning model and acquiring actions corresponding to the state information; wherein the acts include: adding relationship entities, adding association relationships among existing relationship entities, and not adding entities and association relationships.

8. The apparatus of claim 7, wherein,

The word vector generation module is used for processing the text to be annotated by using a word vector generation model to generate the word vector; wherein the word vector generation model comprises: word2Vec model.

9. The apparatus of claim 8, further comprising:

The model training module is used for constructing the relationship graph processing model for identifying the relationship subgraph; and training the relationship graph processing model according to the relationship sub-graph sample set.

10. The apparatus of claim 9, wherein,

11. The apparatus of claim 7, wherein,

12. The apparatus of claim 7, further comprising:

The model optimization module is used for extracting information according to the target entity relation corresponding to the text to be marked and the new entity relation, and calculating by using a preset reward and punishment function to obtain a reward and punishment processing result; wherein, the rewarding and punishing processing result comprises: adding and subtracting the points without changing the points; and carrying out optimization processing on the reinforcement learning model based on the reward and punishment processing result.

13. An entity relationship extraction apparatus, comprising:

A memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-6 based on instructions stored in the memory.

14. A data annotation system comprising:

The entity-relationship extracting apparatus according to any one of claims 7 to 13.

15. A computer readable storage medium storing computer instructions for execution by a processor of the method of any one of claims 1 to 6.