WO2023207096A1 - 一种实体链接方法、装置、设备及非易失性可读存储介质 - Google Patents
一种实体链接方法、装置、设备及非易失性可读存储介质 Download PDFInfo
- Publication number
- WO2023207096A1 WO2023207096A1 PCT/CN2022/135991 CN2022135991W WO2023207096A1 WO 2023207096 A1 WO2023207096 A1 WO 2023207096A1 CN 2022135991 W CN2022135991 W CN 2022135991W WO 2023207096 A1 WO2023207096 A1 WO 2023207096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- entity
- model
- training
- sequence
- mention
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000012549 training Methods 0.000 claims abstract description 212
- 230000004927 fusion Effects 0.000 claims abstract description 68
- 230000006870 function Effects 0.000 claims abstract description 30
- 239000013598 vector Substances 0.000 claims description 40
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 9
- 230000000052 comparative effect Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- This application relates to the field of natural language processing technology, and in particular to an entity linking method, device, equipment and non-volatile readable storage medium.
- Entity linking is to link the entities mentioned in the text to the corresponding entities in the knowledge base. It is the first and crucial step for machines to understand natural language.
- the input of entity linking usually includes the reference (Mention) and context of the entity and the knowledge base to be linked, and the output of entity linking is the entity referring to the corresponding knowledge base.
- the entity linking problem is very simple, but in practical applications there are often ambiguities.
- an entity can be expressed in multiple ways; on the other hand, the same name can refer to different entities.
- the entity linking method generally includes three steps: named entity recognition (MD, mention detection), generation of candidate entities, and entity disambiguation.
- MD named entity recognition
- mention detection generation of candidate entities
- entity disambiguation generation of candidate entities
- the disadvantage of this approach is that if an error occurs in the first step of mention detection, subsequent generation and disambiguation operations of candidate entities will produce erroneous superpositions, resulting in poor results.
- the purpose of this application is to provide an entity linking method, device, equipment and non-volatile readable storage medium, which can improve the accuracy of entity linking and the performance of entity linking on the open knowledge graph.
- the specific plan is as follows:
- the first aspect of this application provides an entity linking method, including:
- the first model is used to calculate the similarity between the first fusion sequence and the second fusion sequence mentioned by the entity, and the linked entities mentioned by the entity are determined from the candidate entities based on the similarity; where the first model is an entity using training text
- the positive samples and negative samples mentioning the training samples are obtained by training the pre-training model using the contrastive loss function through contrastive learning; the positive samples are respectively composed of entity mention training samples and entity description training samples of the correct entity, correct entities and A sequence composed of training text.
- the negative sample is a sequence composed of entity mention training samples and entity description training samples of incorrect entities, incorrect entities and training text.
- obtain entity mentions corresponding to the input text including:
- the second model is used to determine the entity mention position of the input text, and the entity mention corresponding to the input text is determined based on the entity mention position.
- the second model includes BERT neural network and CRF neural network
- the second model is used to determine the entity mention position of the input text, including:
- the word vector of the input text is processed through the BERT neural network and the CRF neural network in sequence to obtain the BIO tag that represents the mention position of the entity.
- obtain the candidate entities mentioned by the entity and the entity description of the candidate entities including:
- entity linking methods also include:
- the sequence composed of the entity mention training sample and the entity description training sample of the correct entity is determined as the first positive sample sequence and the sequence composed of the correct entity and the training text is determined as the second positive sample sequence, and the sequence composed of the entity mention training sample and the correct entity is determined as the first positive sample sequence.
- the sequence composed of the entity description training samples of the incorrect entity is determined as the first negative sample sequence and the sequence composed of the incorrect entity and the training text is determined as the second negative sample sequence;
- the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence are used to train the pre-training model using the contrastive loss function through contrastive learning to obtain the first model.
- entity linking methods also include:
- the first model is trained by training the second model using the cross-entropy loss function and using the output of the trained second model as the input of the first model;
- the input text is input to the end-to-end integrated model to be processed by the second model, the third model and the first model in sequence and then the corresponding similarity is output.
- the second aspect of this application provides an entity linking device, including:
- the acquisition module is used to obtain entity mentions corresponding to the input text, candidate entities of the entity mentions, and entity descriptions of the candidate entities;
- a building module for constructing a first fusion sequence containing entity mentions and entity descriptions and a second fusion sequence containing candidate entities and input text;
- the calculation and determination module is configured to use the first model to calculate the similarity between the first fusion sequence and the second fusion sequence mentioned by the entity, and determine the link entity mentioned by the entity from the candidate entities based on the similarity; wherein, the first model
- the pre-training model using the contrastive loss function is trained through contrastive learning;
- the positive sample is the entity description of the entity mention training sample and the correct entity respectively.
- the sequence consists of training samples, correct entities and training texts.
- the negative samples are the sequences consisting of entity mention training samples and entity description training samples of incorrect entities, incorrect entities and training texts respectively.
- a third aspect of the present application provides an electronic device.
- the electronic device includes a processor and a memory; the memory is used to store a computer program, and the computer program is loaded and executed by the processor to implement the aforementioned entity linking method.
- the fourth aspect of the present application provides a computer non-volatile readable storage medium.
- Computer executable instructions are stored in the computer non-volatile readable storage medium.
- the computer executable instructions are loaded and executed by a processor, The aforementioned entity linking method.
- the entity mentions corresponding to the input text, the candidate entities of the entity mentions, and the entity descriptions of the candidate entities are first obtained; then a first fusion sequence containing the entity mentions and entity descriptions and a first fusion sequence containing the candidate entities and the input text are constructed. the second fusion sequence; finally, the first model is used to calculate the similarity between the first fusion sequence mentioned by the entity and the second fusion sequence, and the linked entity mentioned by the entity is determined from the candidate entities based on the similarity; where, the first model In order to utilize the positive and negative samples of the entity mention training sample of the training text, the pre-training model using the contrastive loss function is trained through contrastive learning; the positive sample is the entity description of the entity mention training sample and the correct entity respectively.
- the sequence consists of training samples, correct entities and training texts.
- the negative samples are the sequences consisting of entity mention training samples and entity description training samples of incorrect entities, incorrect entities and training texts respectively. It can be seen that this application is applicable to the entity linking task of any open knowledge graph.
- entity linking processing on the input text, on the basis of initially extracting the entity mentions of the input text and determining the candidate entities, the candidate entity correspondence is further obtained.
- entity description is integrated into the entity link to obtain the corresponding fusion sequence.
- the model is trained through positive and negative sample comparative learning to achieve entity disambiguation.
- the trained model is used to calculate the similarity of the fusion sequence to filter.
- the correct entity among the candidate entities is extracted, that is, the linked entity, which improves the accuracy of entity linking and the performance of entity linking on the open knowledge graph.
- Figure 1 is a flow chart of an entity linking method provided by this application.
- Figure 2 is a flow chart of a specific first model training method provided by this application.
- Figure 3 is a flow chart of a specific entity linking method provided by this application.
- Figure 4 is a flow chart of a specific entity linking method provided by this application.
- Figure 5 is a specific second model structure diagram provided by this application.
- Figure 6 is a specific entity link logic diagram provided by this application.
- Figure 7 is a schematic structural diagram of an entity linking device provided by this application.
- Figure 8 is a structural diagram of an entity link electronic device provided by this application.
- this application provides an entity linking solution that integrates entity description information into the entity link, and at the same time trains the model through positive and negative sample comparative learning to achieve entity disambiguation, improve the accuracy of entity linking, and improve the accuracy of entity linking in the open knowledge graph. Performance of entity linking.
- Figure 1 is a flow chart of an entity linking method provided by an embodiment of the present application. As shown in Figure 1, the entity linking method includes:
- entity mentions corresponding to the input text are first obtained.
- the entity mentions are the names of persons, places, etc. that may be entities in the initially determined input text.
- candidate entities mentioned by the entity which are aliases or synonyms of entity mentions that exist in the knowledge base, etc.
- the entity description of the candidate entity is obtained to integrate the entity description information into the entity link processing process.
- the entity description contains information such as what the entity is and what characteristics it has.
- candidate entities contain correct entities and incorrect entities
- the ultimate goal of entity linking is to filter out correct entities from candidate entities.
- the candidate entities initially obtained in the knowledge base include tennis player Li Na and singer Li Na. Or entities such as gymnast Li Na, the ultimate goal is to use the context "...Australian Open Champion" to link the name “Li Na” to the correct entity of tennis player Li Na in the knowledge base.
- S12 Construct a first fusion sequence containing entity mentions and entity descriptions and a second fusion sequence containing candidate entities and input text.
- the entity description information needs to be integrated into the entity link processing process. Specifically, it is necessary to construct a first fusion sequence containing entity mentions and entity descriptions and a second fusion sequence containing candidate entities and input text.
- the first fusion sequence and the second fusion sequence are generally expressed in the form of vectors, that is, the first fusion sequence is obtained by splicing the vector mentioned by the entity and the vector described by the entity, and the second fusion sequence is obtained by splicing the vector of the candidate entity.
- the vector of is concatenated with the vector of the input text.
- S13 Use the first model to calculate the similarity between the first fusion sequence and the second fusion sequence mentioned by the entity, and determine the linked entities mentioned by the entity from the candidate entities based on the similarity; where the first model uses training text
- the positive samples and negative samples of the entity mention training samples are obtained by training the pre-training model using the contrastive loss function through contrastive learning; the positive samples are respectively composed of the entity mention training samples and the entity description training samples of the correct entity, and the correct A sequence composed of entities and training text.
- the negative sample is a sequence composed of entity mention training samples and incorrect entity entity description training samples, incorrect entities and training text respectively.
- the first model is first used to calculate the similarity between the first fusion sequence and the second fusion sequence mentioned by the entity, and then the link entity mentioned by the entity is determined from the candidate entities based on the similarity.
- the first model is also the entity disambiguation model.
- the first model is mainly used to calculate the similarity between the first fusion sequence and the second fusion sequence.
- the first model is obtained by using the positive and negative samples of the training text entity mention training sample to train the pre-training model using the contrastive loss function through contrastive learning.
- the training process is shown in Figure 2. The specific method is as follows:
- S132 Use the second model to perform entity extraction on the training text to obtain entity mention training samples corresponding to the training text, and determine candidate entity training samples corresponding to the entity mention training samples through the third model.
- S133 Determine correct entities and incorrect entities and corresponding entity description training samples from the candidate entity training samples.
- S134 Determine the sequence composed of the entity mention training sample and the entity description training sample of the correct entity as the first positive sample sequence and determine the sequence composed of the correct entity and the training text as the second positive sample sequence, and train the entity mention The sequence composed of the sample and the entity description training sample of the incorrect entity is determined as the first negative sample sequence, and the sequence composed of the incorrect entity and the training text is determined as the second negative sample sequence.
- training samples need to be constructed, including positive samples and negative samples for comparative learning training.
- the training text is obtained, and then the second model is used to perform entity extraction on the training text to obtain entity mention training samples corresponding to the training text, and the candidate entity training samples corresponding to the entity mention training samples are determined through the third model. Then, correct entities and incorrect entities and corresponding entity description training samples are determined from the candidate entity training samples.
- the second and third models are the same as those in the previous steps.
- the training text is S, and its vector is expressed as
- the candidate entity training sample is expressed as 30 is expressed as there are 30 candidate entities for each entity mention)
- the entity description training sample is expressed as
- the sequence composed of the entity mention training sample and the entity description training sample of the correct entity is determined as the first positive sample sequence
- the sequence composed of the correct entity and the training text is determined as the second positive sample sequence
- the sequence composed of the entity mention training sample and the correct entity entity description training sample is determined as the second positive sample sequence.
- a sequence composed of entity description training samples mentioning training samples and incorrect entities is determined as a first negative sample sequence
- a sequence composed of incorrect entities and training text is determined as a second negative sample sequence.
- the correct entity in the candidate entity training sample is represented as The entity description training sample of the correct entity is represented as compare em i with After splicing, the first positive sample sequence is obtained as Will After splicing with sent, the second positive sample sequence is obtained as In the same way, for entity mention em i , the incorrect entity in the candidate entity training sample is represented as The entity description training sample of the incorrect entity is expressed as compare em i with After splicing, the first negative sample sequence is obtained as Will After splicing with sent, the second negative sample sequence is obtained as Among them, p ⁇ r.
- S135 Use the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence to train the pre-trained model using the contrastive loss function through contrastive learning to obtain the first model.
- the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence are used to train the pre-training model using the contrastive loss function through contrastive learning to obtain the first model.
- the pre-training model is used to calculate the representation vectors of the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence respectively, and the NCE_LOSS function is calculated based on the similarity between each representation vector.
- the relevant network parameters are adjusted so that the loss value is less than the second threshold. That is, combine the obtained MD sequence and CS sequence in pairs, input the same pre-training model, and use the output of the first position of the model as the representation vector of the sequence.
- the representation vector is expressed as Then the similarity score is calculated by calculating the inner product between the two vectors.
- the formula is expressed as:
- the network parameters of the pre-trained network are adjusted by the loss value calculated by the above formula until L is less than the second threshold or three training rounds are completed.
- the second threshold may be set to 0.01.
- the similarity score is directly calculated and output, and the candidate entity with the highest score is used as the linked entity mentioned by the entity.
- the embodiment of the present application first obtains the entity mentions corresponding to the input text, the candidate entities of the entity mentions, and the entity descriptions of the candidate entities; then constructs a first fusion sequence containing the entity mentions and entity descriptions and a first fusion sequence containing the candidate entities and the input The second fusion sequence of the text; finally, the first model is used to calculate the similarity between the first fusion sequence mentioned by the entity and the second fusion sequence, and the linked entity mentioned by the entity is determined from the candidate entities based on the similarity; where, the One model is obtained by training the pre-training model using the contrastive loss function through contrastive learning using the positive samples and negative samples of the entity mention training sample of the training text; the positive samples are respectively composed of the entity mention training sample and the correct entity.
- the entity description training sample, the sequence composed of the correct entity and the training text, and the negative sample is the sequence composed of the entity mention training sample and the entity description training sample of the incorrect entity, the incorrect entity and the training text respectively.
- the embodiments of this application are applicable to the entity linking tasks of any open knowledge graph.
- entity linking processing on the input text on the basis of initially extracting the entity mentions of the input text and determining the candidate entities, the candidate entity correspondence is further obtained.
- the entity description is integrated into the entity link to obtain the corresponding fusion sequence.
- the model is trained through positive and negative sample comparative learning to achieve entity disambiguation.
- the trained model is used to calculate the similarity of the fusion sequence to filter.
- the correct entity among the candidate entities is extracted, that is, the linked entity, which improves the accuracy of entity linking and the performance of entity linking on the open knowledge graph.
- Figure 3 is a flow chart of a specific entity linking method provided by an embodiment of the present application. As shown in Figure 3, the entity linking method includes:
- entity extraction, candidate entity acquisition, and entity disambiguation processing are integrated into one model to perform end-to-end entity linking. That is, the second model for obtaining entity mentions, the third model for obtaining candidate entities, and the third model are integrated into one model to obtain a corresponding end-to-end integrated model.
- the input text is directly input to the end-to-end integrated model, so that the corresponding similarity is output after being processed by the second model, the third model, and the first model in sequence. degree, and determine the linked entities mentioned by the entity from the candidate entities based on the similarity.
- Figure 4 The specific method is shown in Figure 4, including the following steps:
- S221 Use the second model to determine the entity mention position of the input text, and determine the entity mention corresponding to the input text based on the entity mention position.
- the process of entity mention is to use the second model to determine the entity mention position of the input text, and determine the entity mention corresponding to the input text based on the entity mention position.
- the second model includes BERT neural network and CRF neural network.
- the word vector of the input text is processed through the BERT neural network and the CRF neural network in sequence to obtain the BIO tag that represents the mention position of the entity.
- the model is shown in Figure 5. After converting the input text that needs to be identified into a word vector, input it into the pre-trained BERT neural network, then pass the output of the BERT neural network through the CRF neural network, and finally output the label probability of BIO, through the BIO of each position Tags get entity mention locations.
- other models capable of entity extraction can also be used in the entity linking method of the present application, which is not limited in the embodiments of the present application.
- S222 Use the third model to respectively calculate the matching degree between the entity mention and the combined text composed of each category name in the knowledge base entity list, and determine entities with alias types corresponding to a matching degree greater than the first threshold as candidate entities.
- the candidate entities mainly use the text similarity matching method and use the third model to respectively calculate the matching degree between the entity mentions and the combined text composed of each category name in the knowledge base entity list, and will be greater than the first threshold.
- Entities of the alias type corresponding to the matching degree are determined as candidate entities.
- the third model can be a BM25 algorithm model.
- other algorithms for text similarity calculation can also achieve the same technical effect.
- the knowledge base entity list is the Wiki entity list, that is, Falcon Candidates' vocabulary, which extends each entity tag in Wikipedia into many aliases. Calculate the matching degree between each entity mention and each category name in the Wiki entity list. Each entity mention is used as a query.
- the alias of each entity in the entity list forms a document.
- the matching degree between the query and the document is calculated through the BM25 algorithm. . Finally, the entities are sorted according to the calculated matching degree, and the top 30 entities with the matching degree mentioned by each entity are obtained to form a candidate entity set. At the same time, the top 30 candidate entities mentioned by each entity correspond to the first paragraph of text explained by Wikipedia as supplementary information, that is, entity description.
- S224 Construct a first fusion sequence containing entity mentions and entity descriptions and a second fusion sequence containing candidate entities and input text.
- S225 Use the first model to calculate the similarity between the first fusion sequence and the second fusion sequence mentioned by the entity.
- the first model in this embodiment can be obtained by performing comparative learning training on the pre-trained RoBerta model, and can be regarded as a high-order model of the RoBerta model.
- the logic diagram of the above steps is shown in Figure 6.
- this embodiment in order to maximize the accuracy of the entity profile operation results, after obtaining the end-to-end integrated model, during model training, the second model using the cross-entropy loss function is trained and the trained The output of the second model is used as the input of the first model to train the first model.
- this embodiment combines the above three tasks to obtain an integrated entity link model, and uses a comparative learning method based on the fusion of entity description information to improve the performance of entity linking.
- an entity linking device which includes:
- the acquisition module 11 is used to obtain entity mentions corresponding to the input text, candidate entities of the entity mentions, and entity descriptions of the candidate entities;
- Building module 12 configured to construct a first fusion sequence containing entity mentions and entity descriptions and a second fusion sequence containing candidate entities and input text;
- the calculation and determination module 13 is configured to use the first model to calculate the similarity between the first fusion sequence mentioned by the entity and the second fusion sequence, and determine the link entity mentioned by the entity from the candidate entities based on the similarity; wherein, the first The model is obtained by training a pre-training model using a contrastive loss function through contrastive learning using the positive and negative samples of the training sample that are mentioned by entities in the training text; the positive samples are entities that are mentioned by the entity in the training sample and the correct entity respectively. Describes the sequence composed of training samples, correct entities and training text.
- the negative sample is the sequence composed of entity mention training samples and entity description training samples of incorrect entities, incorrect entities and training text respectively.
- the embodiment of the present application first obtains the entity mentions corresponding to the input text, the candidate entities of the entity mentions, and the entity descriptions of the candidate entities; then constructs a first fusion sequence containing the entity mentions and entity descriptions and a first fusion sequence containing the candidate entities and the input The second fusion sequence of the text; finally, the first model is used to calculate the similarity between the first fusion sequence mentioned by the entity and the second fusion sequence, and the linked entity mentioned by the entity is determined from the candidate entities based on the similarity; where, the One model is obtained by training the pre-training model using the contrastive loss function through contrastive learning using the positive samples and negative samples of the entity mention training sample of the training text; the positive samples are respectively composed of the entity mention training sample and the correct entity.
- the entity description training sample, the sequence composed of the correct entity and the training text, and the negative sample is the sequence composed of the entity mention training sample and the entity description training sample of the incorrect entity, the incorrect entity and the training text respectively.
- the embodiments of this application are applicable to the entity linking tasks of any open knowledge graph.
- entity linking processing on the input text on the basis of initially extracting the entity mentions of the input text and determining the candidate entities, the candidate entity correspondence is further obtained.
- the entity description is integrated into the entity link to obtain the corresponding fusion sequence.
- the model is trained through positive and negative sample comparative learning to achieve entity disambiguation.
- the trained model is used to calculate the similarity of the fusion sequence to filter.
- the correct entity among the candidate entities is extracted, that is, the linked entity, which improves the accuracy of entity linking and the performance of entity linking on the open knowledge graph.
- the acquisition module 11 specifically includes:
- An extraction unit configured to use the second model to determine the entity mention position of the input text, and determine the entity mention corresponding to the input text based on the entity mention position;
- the matching unit is configured to use the third model to respectively calculate the matching degree between the entity mention and the combined text composed of each category name in the knowledge base entity list, and determine the entity with the alias type corresponding to the matching degree greater than the first threshold as candidate entity;
- the reading unit is used to read the entity description of the candidate entity from the entity list.
- the entity linking device further includes:
- a sample acquisition module is used to obtain training text; use the second model to perform entity extraction on the training text to obtain entity mention training samples corresponding to the training text, and determine candidate entity training corresponding to the entity mention training sample through the third model sample;
- a determination module for determining correct entities and incorrect entities and corresponding entity description training samples from candidate entity training samples
- Positive and negative sample sequence building module used to determine the sequence composed of the entity mention training sample and the entity description training sample of the correct entity as the first positive sample sequence and determine the sequence composed of the correct entity and the training text as the second positive sample sequence , and determine the sequence composed of the entity mention training sample and the entity description training sample of the incorrect entity as the first negative sample sequence and determine the sequence composed of the incorrect entity and the training text as the second negative sample sequence;
- the model training module is used to use the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence to train the pre-trained model using the contrastive loss function through contrastive learning to obtain the first model. .
- the model training module is specifically configured to use the pre-trained model to calculate the representation vectors of the first positive sample sequence, the second positive sample sequence, the first negative sample sequence and the second negative sample sequence respectively, and calculate the representation vectors according to After calculating the loss value of the NCE_LOSS function based on the similarity between each representation vector, the relevant network parameters are adjusted so that the loss value is less than the second threshold.
- the entity linking device further includes:
- the model integration module is used to integrate the second model for obtaining entity mentions, the third model for obtaining candidate entities, and the third model into one model to obtain the corresponding end-to-end integrated model;
- Integrated training module used for training the first model by training the second model using the cross-entropy loss function and using the output of the trained second model as the input of the first model during model training;
- the integrated calculation module is used to input the input text into the end-to-end integrated model when performing entity linking, and then output the corresponding similarity after being processed by the second model, the third model and the first model.
- FIG. 8 is a structural diagram of the electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered as any limitation on the scope of the present application.
- FIG. 8 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application.
- the electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input/output interface 25 and a communication bus 26.
- the memory 22 is used to store computer programs, and the computer programs are loaded and executed by the processor 21 to implement relevant steps in the entity linking method disclosed in any of the foregoing embodiments.
- the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20;
- the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here;
- the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here Not specifically limited.
- the memory 22, as a carrier for resource storage can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
- the resources stored thereon can include an operating system 221, a computer program 222 and data 223, etc., and the storage method can be short-term storage. Or stored permanently.
- the operating system 221 is used to manage and control each hardware device and the computer program 222 on the electronic device 20 to realize the calculation and processing of the massive data 223 in the memory 22 by the processor 21.
- It can be Windows Server, Netware, Unix, Linux etc.
- the computer program 222 may further include computer programs that can be used to complete other specific tasks.
- the data 223 may include data such as text information collected by the electronic device 20 .
- embodiments of the present application also disclose a non-volatile readable storage medium.
- a computer program is stored in the non-volatile readable storage medium. When the computer program is loaded and executed by the processor, any of the foregoing implementations can be realized.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种实体链接方法、装置、设备及非易失性可读存储介质,所述方法包括:获取与输入文本对应的实体提及、所述实体提及的候选实体以及所述候选实体的实体描述(S11);构建包含所述实体提及与所述实体描述的第一融合序列和包含所述候选实体与所述输入文本的第二融合序列(S12);利用第一模型计算所述实体提及的所述第一融合序列与所述第二融合序列的相似度,并根据所述相似度从所述候选实体中确定出所述实体提及的链接实体;其中,所述第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由所述实体提及训练样本与正确实体的实体描述训练样本、正确实体与所述训练文本组成的序列,负样本为分别由所述实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与所述训练文本组成的序列(S13)。上述方法在实体链接中融入实体描述信息,同时通过正负样本对比学习的方式训练模型来实现实体消歧,提高实体链接准确度以及在开放知识图谱上进行实体链接的性能。
Description
相关申请的交叉引用
本申请要求于2022年04月29日提交中国专利局,申请号为202210466937.2,申请名称为“一种实体链接方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及自然语言处理技术领域,特别涉及一种实体链接方法、装置、设备及非易失性可读存储介质。
实体链接是将文本中所提及的实体链接到知识库中的相应实体,是让机器理解自然语言的第一步,也是至关重要的一步。实体链接的输入通常包含实体的指代(Mention)和上下文以及待链接的知识库,实体链接的输出是指代所对应的知识库的实体。当指代与实体之间是一一对应的关系,也就是没有歧义时,实体链接问题十分简单,但实际应用中其往往会存在歧义。一方面,一个实体可以有多种表达方式;另一方面,同一名称可以指代不同实体。
现有技术中,实体链接的方法大致包括命名实体识别(MD,mention detection)、候选实体的生成和实体的消歧三个步骤。然而这种做法的缺点是如果第一步的mention detection发生错误,后面候选实体的生成和消歧操作将会产生错误的叠加,导致结果不佳。
发明内容
有鉴于此,本申请的目的在于提供一种实体链接方法、装置、设备及非易失性可读存储介质,能够提高实体链接准确度以及在开放知识图谱上进行实体链接的性能。其具体方案如下:
本申请的第一方面提供了一种实体链接方法,包括:
获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;
构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;
利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。
可选的,获取与输入文本对应的实体提及,包括:
利用第二模型确定输入文本的实体提及位置,并根据实体提及位置确定出与输入文本对应的实体提及。
可选的,第二模型包括BERT神经网络和CRF神经网络;
相应的,利用第二模型确定输入文本的实体提及位置,包括:
将输入文本的词向量依次通过BERT神经网络和CRF神经网络进行处理,得到表征实体提及位置的BIO标签。
可选的,获取实体提及的候选实体以及候选实体的实体描述,包括:
利用第三模型分别计算实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的匹配度对应的别名种类的实体确定为候选实体;
从实体列表中读取出候选实体的实体描述。
可选的,实体链接方法,还包括:
获取训练文本;
利用第二模型对训练文本进行实体提取得到与训练文本对应的实体提及训练样本,并通过第三模型确定出与实体提及训练样本对应的候选实体训练样本;
从候选实体训练样本中确定出正确实体和非正确实体以及相应的实体描述训练样本;
将由实体提及训练样本与正确实体的实体描述训练样本组成的序列确定为第一正样本序列并将由正确实体与训练文本组成的序列确定为第二正样本序列,以及将由实体提及训练样本与非正确实体的实体描述训练样本组成的序列确定为第一负样本序列并将由非正确实体与训练文本组成的序列确定为第二负样本序列;
利用第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到第一模型。
可选的,利用第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到第一模型,包括:
利用预训练模型分别计算第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列的表征向量,并根据由各表征向量之间的相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使损失值小于第二阈值。
可选的,实体链接方法,还包括:
将获取实体提及的第二模型、获取候选实体的第三模型、和第三模型集成至一个模型,以得到对应的端到端整合模型;
在进行模型训练时,通过对采用交叉熵损失函数的第二模型进行训练并以训练后的第二模型的输出作为第一模型的输入对第一模型进行训练;
在进行实体链接时,将输入文本输入至端到端整合模型以依次经过第二模型、第三模型和第一模型处理后输出相应的相似度。
本申请的第二方面提供了一种实体链接装置,包括:
获取模块,用于获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;
构建模块,用于构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;
计算确定模块,用于利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。
本申请的第三方面提供了一种电子设备,电子设备包括处理器和存储器;其中存储器用于存储计算机程序,计算机程序由处理器加载并执行以实现前述实体链接方法。
本申请的第四方面提供了一种计算机非易失性可读存储介质,计算机非易失性可读存储介质中存储有计算机可执行指令,计算机可执行指令被处理器加载并执行时,实现前述实体链接方法。
本申请中,先获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;然后构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;最后利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实 体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。可见,本申请适用于任何开放知识图谱的实体链接任务,在对输入文本进行实体链接处理时,在初步提取到该输入文本的实体提及并确定出候选实体的基础上,进一步获取候选实体对应的实体描述,在实体链接中融入实体描述信息得到对应的融合序列,同时通过正负样本对比学习的方式训练模型来实现实体消歧,利用训练后的模型对融合序列进行相似度计算以此筛选出候选实体中的正确实体也即链接实体,提高了实体链接准确度以及在开放知识图谱上进行实体链接的性能。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请提供的一种实体链接方法流程图;
图2为本申请提供的一种具体的第一模型训练方法流程图;
图3为本申请提供的一种具体的实体链接方法流程图;
图4为本申请提供的一种具体的实体链接方法流程图;
图5为本申请提供的一种具体的第二模型结构图;
图6为本申请提供的一种具体的实体链接逻辑图;
图7为本申请提供的一种实体链接装置结构示意图;
图8为本申请提供的一种实体链接电子设备结构图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,采用依次进行命名实体识别(MD,mention detection)、候选实体的生成和实体 的消歧的步骤进行实体链接的方法会使得最终实体链接结果不佳,也即如果第一步的命名实体识别发生错误,后面候选实体的生成和消歧操作将会产生错误的叠加。针对上述技术缺陷,本申请提供一种实体链接方案,在实体链接中融入实体描述信息,同时通过正负样本对比学习的方式训练模型来实现实体消歧,提高实体链接准确度以及在开放知识图谱上进行实体链接的性能。
图1为本申请实施例提供的一种实体链接方法流程图。参见图1所示,该实体链接方法包括:
S11:获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述。
本实施例中,对于待进行实体链接的输入文本,首先获取与输入文本对应的实体提及,实体提及为初步确定的输入文本中可能为实体的人名、地名等。然后获取实体提及的候选实体,候选实体为知识库中存在的实体提及的别名或同义词等。在此基础上,获取候选实体的实体描述,以将实体描述信息融入实体链接的处理进程当中。实体描述包含了实体是什么、有什么样的特性等信息。
可以理解,候选实体中包含正确的实体和不正确的实体,实体链接的最终目标是从候选实体中筛选出正确的实体。例如,当输入“李娜在哪一年拿到澳网冠军?”这个文本时,先识别出“李娜”这一实体的指代,在知识库中初步得到的候选实体包括网球运动员李娜、歌手李娜或体操运动员李娜等实体,最终目的是利用上下文“…澳网冠军”将“李娜”这个名称链接到知识库中的网球运动员李娜这一正确实体。
S12:构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列。
本实施例中,在得到输入文本的实体提及、候选实体以及实体描述后,需要将实体描述信息融入实体链接的处理进程当中。具体的,需要构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列。可以理解,第一融合序列和第二融合序列一般以向量的方式存表示,也即第一融合序列是由实体提及的向量与实体描述的向量拼接得到的,第二融合序列是由候选实体的向量与输入文本的向量拼接得到的。
S13:利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练 得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。
本实施例中,首先利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,然后根据相似度从候选实体中确定出实体提及的链接实体。第一模型也即实体消歧模型,第一模型主要是用于计算第一融合序列与第二融合序列的相似度。第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到,训练流程如图2所示,具体方法如下:
S131:获取训练文本。
S132:利用第二模型对训练文本进行实体提取得到与训练文本对应的实体提及训练样本,并通过第三模型确定出与实体提及训练样本对应的候选实体训练样本。
S133:从候选实体训练样本中确定出正确实体和非正确实体以及相应的实体描述训练样本。
S134:将由实体提及训练样本与正确实体的实体描述训练样本组成的序列确定为第一正样本序列并将由正确实体与训练文本组成的序列确定为第二正样本序列,以及将由实体提及训练样本与非正确实体的实体描述训练样本组成的序列确定为第一负样本序列并将由非正确实体与训练文本组成的序列确定为第二负样本序列。
本实施例中,需要构建训练样本,包括用于对比学习训练的正样本和负样本。首先获取训练文本,然后利用第二模型对训练文本进行实体提取得到与训练文本对应的实体提及训练样本,并通过第三模型确定出与实体提及训练样本对应的候选实体训练样本。接着从候选实体训练样本中确定出正确实体和非正确实体以及相应的实体描述训练样本。第二模型和第三模型与前述步骤中的模型相同。假设训练文本为S,其向量表示为,实体提及训练样本表示为sent={x
1,x
2...x
n},实体提及训练样本表示为EM={em
1,em
2...em
i},i表示训练样本中存在i个实体提及;候选实体训练样本表示为
30表示为对于每个实体提及存在30个候选实体);实体描述训练样本表示为
在此基础上,将由实体提及训练样本与正确实体的实体描述训练样本组成的序列确定为第一正样本序列并将由正确实体与训练文本组成的序列确定为第二正样本序列,以及将由实体提及训练样本与非正确实体的实体描述训练样本组成的序列确定为第一负样本序列并将由 非正确实体与训练文本组成的序列确定为第二负样本序列。假设对于实体提及em
i,候选实体训练样本中正确实体表示为
正确实体的实体描述训练样本表示为
将em
i与
进行拼接后得到第一正样本序列表示为
将
与sent进行拼接后得到第二正样本序列表示为
同理,对于实体提及em
i,候选实体训练样本中非正确实体表示为
非正确实体的实体描述训练样本表示为
将em
i与
进行拼接后得到第一负样本序列表示为
将
与sent进行拼接后得到第二负样本序列表示为
其中,p≠r。
S135:利用第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到第一模型。
本实施例中,利用第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到第一模型。具体的,利用预训练模型分别计算第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列的表征向量,并根据由各表征向量之间的相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使损失值小于第二阈值。也即将得到的MD序列和CS序列两两进行组合,输入同一个预训练模型,将模型第一个位置的输出作为序列的表征向量,表征向量表示为
接着通过计算两个向量之间的内积来计算相似度score,公式表示为:
score(f(MD),f(CS))=exp(f(MD)
Tf(CS))
利用上述公式计算得到两两向量之间的相似度后,进一步计算对比损失函数L,公式如下:
L=L
1+L
2
通过上述公式计算的损失值调整预训练网络的网络参数,直到L小于第二阈值或者三个训练轮次结束。本实施例中,第二阈值可以设置为0.01。
本实施例中,在训练好第一模型后,在实际的实体链接操作中,直接计算输出相似度score,取得最高分的候选实体作为该实体提及的链接实体。
可见,本申请实施例先获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;然后构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;最后利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。本申请实施例适用于任何开放知识图谱的实体链接任务,在对输入文本进行实体链接处理时,在初步提取到该输入文本的实体提及并确定出候选实体的基础上,进一步获取候选实体对应的实体描述,在实体链接中融入实体描述信息得到对应的融合序列,同时通过正负样本对比学习的方式训练模型来实现实体消歧,利用训练后的模型对融合序列进行相似度计算以此筛选出候选实体中的正确实体也即链接实体,提高了实体链接准确度以及在开放知识图谱上进行实体链接的性能。
图3为本申请实施例提供的一种具体的实体链接方法流程图。参见图3所示,该实体链接方法包括:
S21:将获取实体提及的第二模型、获取候选实体的第三模型和第三模型集成至一个模型,以得到对应的端到端整合模型。
本实施例中,将实体提取、候选实体获取及实体消歧处理集成至一个模型当中,进行端对端的实体链接。也即将获取实体提及的第二模型、获取候选实体的第三模型和第三模型集成至一个模型,以得到对应的端到端整合模型。
S22:在进行实体链接时,将输入文本输入至端到端整合模型,以依次经过第二模型、第三模型和第一模型处理后输出相应的相似度,并根据相似度从候选实体中确定出实体提及的链接实体。
本实施例中,在得到端到端整合模型之后,实体链接时,直接将输入文本输入至端到端整合模型,以依次经过第二模型、第三模型和第一模型处理后输出相应的相似度,并根据相似度从候选实体中确定出实体提及的链接实体。具体方法如图4所示,包括如下步骤:
S221:利用第二模型确定输入文本的实体提及位置,并根据实体提及位置确定出与输入文本对应的实体提及。
本实施例中,实体提及的过程为利用第二模型确定输入文本的实体提及位置,并根据实 体提及位置确定出与输入文本对应的实体提及。具体的,第二模型包括BERT神经网络和CRF神经网络。在此基础上,将输入文本的词向量依次通过BERT神经网络和CRF神经网络进行处理,得到表征实体提及位置的BIO标签,模型示意如图5所示。将需要进行实体识别的输入文本转换为词向量后,输入到经过预训练的BERT神经网络中,再将BERT神经网络的输出通过CRF神经网络,最后输出BIO的标签概率,通过每个位置的BIO标签得到实体提及位置。当然,其他能进行实体提取的模型也可用于本申请的实体链接方法,本申请实施例对此不进行限定。
S222:利用第三模型分别计算实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的匹配度对应的别名种类的实体确定为候选实体。
S223:从实体列表中读取出候选实体的实体描述。
本实施例中,候选实体主要通过文本相似度匹配方法,利用第三模型分别计算实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的匹配度对应的别名种类的实体确定为候选实体。第三模型可以为BM25算法模型,除此之外,文本相似度计算的其他算法也能达到相同的技术效果。知识库实体列表为维基实体列表,也即Falcon Candidates的词表,该词表将维基百科中的每个实体标签扩展出许多别名。将每个实体提及和维基实体列表中的每类别名计算匹配度,每个实体提及作为query,实体列表中每个实体的别名组成文档,通过BM25算法计算query和文档之间的匹配度。最后按计算得到的匹配度大小进行排序,得到与每个实体提及匹配程度前30的实体构成候选实体集合。同时将每个实体提及的Top30候选实体对应维基百科解释的第一段文本作为补充信息也即实体描述。
S224:构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列。
本实施例中,关于上述步骤S224的具体过程,可以参考前述实施例中公开的相应内容,在此不再进行赘述。
S225:利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度。
本实施例中,关于上述步骤S224和步骤S225的具体过程,可以参考前述实施例中公开的相应内容,在此不再进行赘述。需要说明的是,本实施例中的第一模型可以为对预训练好的RoBerta模型进行对比学习训练后得到,可以视为RoBerta模型的高阶模型。上述步骤的逻辑示意图如图6所示。
S23:在进行模型训练时,通过对采用交叉熵损失函数的第二模型进行训练并以训练后 的第二模型的输出作为第一模型的输入对第一模型进行训练。
本实施例中,为了最大程度上提升实体简介操作结果的准确度,在得到端到端整合模型之后,进行模型训练时,通过对采用交叉熵损失函数的第二模型进行训练并以训练后的第二模型的输出作为第一模型的输入对第一模型进行训练。相较于联合训练上述两个或者单个子任务,本实施例联合上述三个任务得到一个整合的实体链接模型,在融合实体描述信息的基础上采用对比学习方法,提升实体链接的性能。
参见图7所示,本申请实施例还相应公开了一种实体链接装置,包括:
获取模块11,用于获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;
构建模块12,用于构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;
计算确定模块13,用于利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。
可见,本申请实施例先获取与输入文本对应的实体提及、实体提及的候选实体以及候选实体的实体描述;然后构建包含实体提及与实体描述的第一融合序列和包含候选实体与输入文本的第二融合序列;最后利用第一模型计算实体提及的第一融合序列与第二融合序列的相似度,并根据相似度从候选实体中确定出实体提及的链接实体;其中,第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由实体提及训练样本与正确实体的实体描述训练样本、正确实体与训练文本组成的序列,负样本为分别由实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与训练文本组成的序列。本申请实施例适用于任何开放知识图谱的实体链接任务,在对输入文本进行实体链接处理时,在初步提取到该输入文本的实体提及并确定出候选实体的基础上,进一步获取候选实体对应的实体描述,在实体链接中融入实体描述信息得到对应的融合序列,同时通过正负样本对比学习的方式训练模型来实现实体消歧,利用训练后的模型对融合序列进行相似度计算以此筛选出候选实体中的正确实体也即链接实 体,提高了实体链接准确度以及在开放知识图谱上进行实体链接的性能。
在一些具体实施例中,获取模块11,具体包括:
提取单元,用于利用第二模型确定输入文本的实体提及位置,并根据实体提及位置确定出与输入文本对应的实体提及;
匹配单元,用于利用第三模型分别计算实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的匹配度对应的别名种类的实体确定为候选实体;
读取单元,用于从实体列表中读取出候选实体的实体描述。
在一些具体实施例中,实体链接装置还包括:
样本获取模块,用于获取训练文本;利用第二模型对训练文本进行实体提取得到与训练文本对应的实体提及训练样本,并通过第三模型确定出与实体提及训练样本对应的候选实体训练样本;
确定模块,用于从候选实体训练样本中确定出正确实体和非正确实体以及相应的实体描述训练样本;
正负样本序列构建模块,用于将由实体提及训练样本与正确实体的实体描述训练样本组成的序列确定为第一正样本序列并将由正确实体与训练文本组成的序列确定为第二正样本序列,以及将由实体提及训练样本与非正确实体的实体描述训练样本组成的序列确定为第一负样本序列并将由非正确实体与训练文本组成的序列确定为第二负样本序列;
模型训练模块,用于利用第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到第一模型。
在一些具体实施例中,模型训练模块,具体用于利用预训练模型分别计算第一正样本序列、第二正样本序列、第一负样本序列和第二负样本序列的表征向量,并根据由各表征向量之间的相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使损失值小于第二阈值。
在一些具体实施例中,实体链接装置还包括:
模型整合模块,用于将获取实体提及的第二模型、获取候选实体的第三模型、和第三模型集成至一个模型,以得到对应的端到端整合模型;
整合训练模块,用于在进行模型训练时,通过对采用交叉熵损失函数的第二模型进行训练并以训练后的第二模型的输出作为第一模型的输入对第一模型进行训练;
整合计算模块,用于在进行实体链接时,将输入文本输入至端到端整合模型以依次经过第二模型、第三模型和第一模型处理后输出相应的相似度。
进一步的,本申请实施例还提供了一种电子设备。图8是根据一示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任何限制。
图8为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,存储器22用于存储计算机程序,计算机程序由处理器21加载并执行,以实现前述任一实施例公开的实体链接方法中的相关步骤。
本实施例中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括操作系统221、计算机程序222及数据223等,存储方式可以是短暂存储或者永久存储。
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的实体链接方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据223可以包括电子设备20收集到的文本信息等数据。
进一步的,本申请实施例还公开了一种非易失性可读存储介质,非易失性可读存储介质中存储有计算机程序,计算机程序被处理器加载并执行时,实现前述任一实施例公开的实体链接方法步骤。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个 实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个…”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请所提供的实体链接方法、装置、设备及非易失性可读存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
Claims (20)
- 一种实体链接方法,其特征在于,包括:获取与输入文本对应的实体提及、所述实体提及的候选实体以及所述候选实体的实体描述;构建包含所述实体提及与所述实体描述的第一融合序列和包含所述候选实体与所述输入文本的第二融合序列;利用第一模型计算所述实体提及的所述第一融合序列与所述第二融合序列的相似度,并根据所述相似度从所述候选实体中确定出所述实体提及的链接实体;其中,所述第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由所述实体提及训练样本与正确实体的实体描述训练样本、正确实体与所述训练文本组成的序列,负样本为分别由所述实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与所述训练文本组成的序列。
- 根据权利要求1所述的实体链接方法,其特征在于,所述获取与输入文本对应的实体提及,包括:利用第二模型确定所述输入文本的实体提及位置,并根据实体提及位置确定出与所述输入文本对应的所述实体提及。
- 根据权利要求2所述的实体链接方法,其特征在于,所述第二模型包括BERT神经网络和CRF神经网络;相应的,所述利用第二模型确定所述输入文本的实体提及位置,包括:将所述输入文本的词向量依次通过所述BERT神经网络和所述CRF神经网络进行处理,得到表征实体提及位置的BIO标签。
- 根据权利要求1所述的实体链接方法,其特征在于,获取所述实体提及的候选实体以及所述候选实体的实体描述,包括:利用第三模型分别计算所述实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的所述匹配度对应的别名种类的实体确定为所述候选实体;从所述实体列表中读取出所述候选实体的所述实体描述。
- 根据权利要求1所述的实体链接方法,其特征在于,还包括:获取所述训练文本;利用第二模型对所述训练文本进行实体提取得到与所述训练文本对应的所述实体提 及训练样本,并通过第三模型确定出与所述实体提及训练样本对应的候选实体训练样本;从所述候选实体训练样本中确定出正确实体和非正确实体以及相应的实体描述训练样本;将由所述实体提及训练样本与正确实体的实体描述训练样本组成的序列确定为第一正样本序列并将由正确实体与所述训练文本组成的序列确定为第二正样本序列,以及将由所述实体提及训练样本与非正确实体的实体描述训练样本组成的序列确定为第一负样本序列并将由非正确实体与所述训练文本组成的序列确定为第二负样本序列;利用所述第一正样本序列、所述第二正样本序列、所述第一负样本序列和所述第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到所述第一模型。
- 根据权利要求5所述的实体链接方法,其特征在于,所述利用所述第一正样本序列、所述第二正样本序列、所述第一负样本序列和所述第二负样本序列通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到所述第一模型,包括:利用预训练模型分别计算所述第一正样本序列、所述第二正样本序列、所述第一负样本序列和所述第二负样本序列的表征向量,并根据由各表征向量之间的所述相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使所述损失值小于第二阈值。
- 根据权利要求1至6任一项所述的实体链接方法,其特征在于,还包括:将获取所述实体提及的第二模型、获取所述候选实体的所述第三模型和所述第三模型集成至一个模型,以得到对应的端到端整合模型;在进行模型训练时,通过对采用交叉熵损失函数的所述第二模型进行训练并以训练后的所述第二模型的输出作为所述第一模型的输入对所述第一模型进行训练;在进行实体链接时,将所述输入文本输入至所述端到端整合模型以依次经过所述第二模型、所述第三模型和所述第一模型处理后输出相应的所述相似度。
- 根据权利要求3所述的实体链接方法,其特征在于,所述将所述输入文本的词向量依次通过所述BERT神经网络和所述CRF神经网络进行处理,得到表征实体提及位置的BIO标签,包括:将所述输入文本转换为词向量;将所述词向量输入到经过预训练的所述BERT神经网络中,再将所述BERT神经网络的输出通过所述CRF神经网络,从而输出针对BIO标签的标签概率,并通过所述标签概率的最大值得到表征实体提及位置的BIO标签。
- 根据权利要求4所述的实体链接方法,其特征在于,所述利用第三模型分别计算所述实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,并将大于第一阈值的所述匹配度对应的别名种类的实体确定为所述候选实体,包括:利用所述第三模型分别计算所述实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度;采用大于所述第一阈值的所述匹配度对应的别名种类的实体构建针对候选实体的候选实体集合,并将所述候选实体集合中的实体确定为所述候选实体。
- 根据权利要求9所述的实体链接方法,其特征在于,还包括:根据所述匹配度对与所述匹配度对应的别名种类的实体进行排序。
- 根据权利要求9所述的实体链接方法,其特征在于,所述第三模型为BM25算法模型,所述知识库实体列表为搜索引擎实体列表,所述搜索引擎实体列表用于将搜索引擎百科中的每个实体标签扩展出别名,所述利用第三模型分别计算所述实体提及与知识库实体列表中每类别名组成的组合文本之间的匹配度,包括:将每个所述实体提及作为查询query,将所述搜索引擎实体列表中每个实体的别名组成文档,利用BM25算法模型分别计算每个所述查询query和所述文档之间的匹配度。
- 根据权利要求9所述的实体链接方法,其特征在于,所述从所述实体列表中读取出所述候选实体的所述实体描述,包括:从所述候选实体集合中按照预设数值读取出所述候选实体的实体描述。
- 根据权利要求6所述的实体链接方法,其特征在于,所述利用预训练模型分别计算所述第一正样本序列、所述第二正样本序列、所述第一负样本序列和所述第二负样本序列的表征向量,并根据由各表征向量之间的所述相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使所述损失值小于第二阈值,包括:利用预训练模型分别计算所述第一正样本序列、所述第二正样本序列、所述第一负样本序列和所述第二负样本序列的表征向量,并计算表征向量之间的内积;通过所述内积计算各表征向量之间的相似度,并根据所述相似度计算NCE_LOSS函数的损失值后对相关网络参数进行调整以使所述损失值小于第二阈值。
- 根据权利要求5或6或7所述的实体链接方法,其特征在于,所述第一模型为对预训练好的RoBerta模型进行对比学习训练后得到,所述第一模型为RoBerta模型的高阶模型。
- 根据权利要求7所述的实体链接方法,其特征在于,所述将获取所述实体提及的第二模型、获取所述候选实体的所述第三模型和所述第三模型集成至一个模型,以得 到对应的端到端整合模型,包括:通过将实体提取操作、候选实体获取操作及实体消歧处理操作集成至一个模型当中,进行端对端的实体链接,从而将获取实体提及的第二模型、获取候选实体的第三模型和第三模型集成至一个模型,以得到对应的端到端整合模型。
- 根据权利要求1所述的实体链接方法,其特征在于,所述实体提及具有对应的实体提及向量,所述实体描述具有对应的实体描述向量,采用如下方式构建包含所述实体提及与所述实体描述的第一融合序列:通过所述实体提及向量和实体描述向量,生成第一融合序列。
- 根据权利要求1所述的实体链接方法,其特征在于,所述候选实体具有对应的候选实体向量,所述输入文本具有对应的输入文本向量,采用如下方式构建包含所述候选实体与所述输入文本的第二融合序列:通过所述候选实体向量和所述输入文本向量,生成第二融合序列。
- 一种实体链接装置,其特征在于,包括:获取模块,用于获取与输入文本对应的实体提及、所述实体提及的候选实体以及所述候选实体的实体描述;构建模块,用于构建包含所述实体提及与所述实体描述的第一融合序列和包含所述候选实体与所述输入文本的第二融合序列;计算确定模块,用于利用第一模型计算所述实体提及的所述第一融合序列与所述第二融合序列的相似度,并根据所述相似度从所述候选实体中确定出所述实体提及的链接实体;其中,所述第一模型为利用训练文本的实体提及训练样本的正样本和负样本通过对比学习的方式对采用对比损失函数的预训练模型进行训练得到;正样本为分别由所述实体提及训练样本与正确实体的实体描述训练样本、正确实体与所述训练文本组成的序列,负样本为分别由所述实体提及训练样本与非正确实体的实体描述训练样本、非正确实体与所述训练文本组成的序列。
- 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;其中所述存储器用于存储计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至17任一项所述的实体链接方法。
- 一种计算机非易失性可读存储介质,其特征在于,用于存储计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至17任一项所述的实体链接方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210466937.2A CN114841164A (zh) | 2022-04-29 | 2022-04-29 | 一种实体链接方法、装置、设备及存储介质 |
CN202210466937.2 | 2022-04-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023207096A1 true WO2023207096A1 (zh) | 2023-11-02 |
Family
ID=82568611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/135991 WO2023207096A1 (zh) | 2022-04-29 | 2022-12-01 | 一种实体链接方法、装置、设备及非易失性可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114841164A (zh) |
WO (1) | WO2023207096A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117314909A (zh) * | 2023-11-29 | 2023-12-29 | 无棣源通电子科技有限公司 | 基于人工智能的电路板缺陷检测方法、装置、设备及介质 |
CN118467867A (zh) * | 2024-07-12 | 2024-08-09 | 杭州恒生聚源信息技术有限公司 | 实体链接处理方法、设备、存储介质及程序产品 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114841164A (zh) * | 2022-04-29 | 2022-08-02 | 浪潮电子信息产业股份有限公司 | 一种实体链接方法、装置、设备及存储介质 |
CN115203438B (zh) * | 2022-09-09 | 2023-02-03 | 北京澜舟科技有限公司 | 一种实体链接方法及存储介质 |
CN115859987B (zh) * | 2023-01-19 | 2023-06-16 | 阿里健康科技(中国)有限公司 | 实体提及识别模块及其链接方法、设备和介质 |
CN116561339A (zh) * | 2023-05-10 | 2023-08-08 | 之江实验室 | 知识图谱实体链接方法、装置、计算机设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137404A1 (en) * | 2016-11-15 | 2018-05-17 | International Business Machines Corporation | Joint learning of local and global features for entity linking via neural networks |
CN108280061A (zh) * | 2018-01-17 | 2018-07-13 | 北京百度网讯科技有限公司 | 基于歧义实体词的文本处理方法和装置 |
CN113626613A (zh) * | 2021-08-18 | 2021-11-09 | 中山大学附属第一医院 | 基于融入知识图谱子图信息及实体信息的实体链接方法 |
CN113779225A (zh) * | 2021-09-17 | 2021-12-10 | 工银科技有限公司 | 实体链接模型的训练方法、实体链接方法及装置 |
CN114003732A (zh) * | 2021-07-13 | 2022-02-01 | 北京金山数字娱乐科技有限公司 | 候选实体生成模型训练方法及装置 |
CN114239583A (zh) * | 2021-12-15 | 2022-03-25 | 北京百度网讯科技有限公司 | 实体链指模型的训练及实体链指方法、装置、设备及介质 |
CN114841164A (zh) * | 2022-04-29 | 2022-08-02 | 浪潮电子信息产业股份有限公司 | 一种实体链接方法、装置、设备及存储介质 |
-
2022
- 2022-04-29 CN CN202210466937.2A patent/CN114841164A/zh active Pending
- 2022-12-01 WO PCT/CN2022/135991 patent/WO2023207096A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180137404A1 (en) * | 2016-11-15 | 2018-05-17 | International Business Machines Corporation | Joint learning of local and global features for entity linking via neural networks |
CN108280061A (zh) * | 2018-01-17 | 2018-07-13 | 北京百度网讯科技有限公司 | 基于歧义实体词的文本处理方法和装置 |
CN114003732A (zh) * | 2021-07-13 | 2022-02-01 | 北京金山数字娱乐科技有限公司 | 候选实体生成模型训练方法及装置 |
CN113626613A (zh) * | 2021-08-18 | 2021-11-09 | 中山大学附属第一医院 | 基于融入知识图谱子图信息及实体信息的实体链接方法 |
CN113779225A (zh) * | 2021-09-17 | 2021-12-10 | 工银科技有限公司 | 实体链接模型的训练方法、实体链接方法及装置 |
CN114239583A (zh) * | 2021-12-15 | 2022-03-25 | 北京百度网讯科技有限公司 | 实体链指模型的训练及实体链指方法、装置、设备及介质 |
CN114841164A (zh) * | 2022-04-29 | 2022-08-02 | 浪潮电子信息产业股份有限公司 | 一种实体链接方法、装置、设备及存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117314909A (zh) * | 2023-11-29 | 2023-12-29 | 无棣源通电子科技有限公司 | 基于人工智能的电路板缺陷检测方法、装置、设备及介质 |
CN117314909B (zh) * | 2023-11-29 | 2024-02-09 | 无棣源通电子科技有限公司 | 基于人工智能的电路板缺陷检测方法、装置、设备及介质 |
CN118467867A (zh) * | 2024-07-12 | 2024-08-09 | 杭州恒生聚源信息技术有限公司 | 实体链接处理方法、设备、存储介质及程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN114841164A (zh) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108334891B (zh) | 一种任务型意图分类方法及装置 | |
WO2023207096A1 (zh) | 一种实体链接方法、装置、设备及非易失性可读存储介质 | |
CN117688163B (zh) | 基于指令微调和检索增强生成的在线智能问答方法及装置 | |
US11144587B2 (en) | User drawing based image search | |
WO2021000676A1 (zh) | 问答方法、问答装置、计算机设备及存储介质 | |
CN110059160B (zh) | 一种端到端的基于上下文的知识库问答方法及装置 | |
US9965717B2 (en) | Learning image representation by distilling from multi-task networks | |
WO2019242297A1 (zh) | 基于机器阅读理解的智能对话方法、装置、终端 | |
CN109376222B (zh) | 问答匹配度计算方法、问答自动匹配方法及装置 | |
CN108932342A (zh) | 一种语义匹配的方法、模型的学习方法及服务器 | |
CN111738003A (zh) | 命名实体识别模型训练方法、命名实体识别方法和介质 | |
CN117114063B (zh) | 用于训练生成式大语言模型和用于处理图像任务的方法 | |
CN113051368B (zh) | 双塔模型训练方法、检索方法、装置及电子设备 | |
CN111831902A (zh) | 推荐理由筛选方法、装置、电子设备 | |
CN116796730A (zh) | 基于人工智能的文本纠错方法、装置、设备及存储介质 | |
CN116186220A (zh) | 信息检索方法、问答处理方法、信息检索装置及系统 | |
US20240265207A1 (en) | Training of an object linking model | |
CN114742058B (zh) | 一种命名实体抽取方法、装置、计算机设备及存储介质 | |
CN113807102B (zh) | 建立语义表示模型的方法、装置、设备和计算机存储介质 | |
CN115186105A (zh) | 实体链接方法及装置 | |
CN113177406B (zh) | 文本处理方法、装置、电子设备和计算机可读介质 | |
CN116049370A (zh) | 信息查询方法和信息生成模型的训练方法、装置 | |
CN115269797A (zh) | 面向知识社区模糊问题的答案推荐方法及系统 | |
CN114580533A (zh) | 特征提取模型的训练方法、装置、设备、介质及程序产品 | |
CN112925961A (zh) | 一种基于企业实体的智能问答方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22939910 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |