Nothing Special   »   [go: up one dir, main page]

CN111523313A - Model training and named entity recognition method and device - Google Patents

Model training and named entity recognition method and device Download PDF

Info

Publication number
CN111523313A
CN111523313A CN202010631307.7A CN202010631307A CN111523313A CN 111523313 A CN111523313 A CN 111523313A CN 202010631307 A CN202010631307 A CN 202010631307A CN 111523313 A CN111523313 A CN 111523313A
Authority
CN
China
Prior art keywords
determining
vector
participle
text segment
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010631307.7A
Other languages
Chinese (zh)
Other versions
CN111523313B (en
Inventor
李扬名
李小龙
姚开盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010631307.7A priority Critical patent/CN111523313B/en
Publication of CN111523313A publication Critical patent/CN111523313A/en
Application granted granted Critical
Publication of CN111523313B publication Critical patent/CN111523313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification provides a model training and named entity recognition method and device. During model training, replacing a first named entity in a first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence; recursively determining hidden vectors of a plurality of participles in the second sample sequence by adopting a first recurrent neural network, and determining a characterization vector of the text segment; constructing a Gaussian distribution based on the characterization vectors and determining a global hidden vector aiming at the text segment through a variational self-encoder; a first recurrent neural network is adopted, a global hidden vector is used as an initial hidden vector, decoding hidden vectors of participles in a text segment are determined recursively, and predicted values of the participles in the text segment are determined; and determining a prediction loss value based on the difference between the words in the text segment and the predicted value thereof and the distribution difference, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value.

Description

Model training and named entity recognition method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of natural language processing technologies, and in particular, to a method and an apparatus for model training and named entity recognition.
Background
In the field of natural language processing technology, classifying named entities (entities) in text sequences is an important direction for research. Named entities have noun properties in part of speech, including names of people, names of organizations, names of places, and all other entity classes identified by names. The broader named entities also include categories of numbers, dates, currencies, addresses, and the like. The method and the device can accurately identify the category of the named entity, and can improve the accuracy and effectiveness of natural language processing.
Typically, models used to identify named entities are trained using a training set, and after training of the models is completed, the models are tested using a test set. One of the challenges in named entity recognition is the recognition of rare entities, extracorporal words and low frequency words. The out-of-set word refers to a named entity that appears in the test set but does not appear in the training set. The low frequency words refer to named entities that appear in the test set, but appear less frequently in the training set. The sparsity of the training data brings great challenges to model training.
Therefore, it is desirable to have an improved scheme that can train a more efficient and accurate model so that the model can better identify rare entities when faced with them.
Disclosure of Invention
One or more embodiments of the specification describe model training and named entity recognition methods and devices to train and obtain a model with better effectiveness and higher accuracy, so that the model can be better recognized when facing rare entities. The specific technical scheme is as follows.
In a first aspect, a model training method for identifying named entities is provided, which is executed by a computer and comprises the following steps:
obtaining a first sample sequence containing a plurality of participles, wherein the participles comprise named entities and non-named entities;
replacing a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence;
recursively determining hidden vectors of a plurality of participles in the second sample sequence by adopting a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;
constructing, by a variational auto-encoder, a Gaussian distribution based on the characterization vectors, determining a global latent vector for the text segment based on the Gaussian distribution;
recursively determining decoding hidden vectors of the participles in the text segment by adopting the first recurrent neural network and taking the global hidden vector as an initial hidden vector, and determining a predicted value of the participles in the text segment based on the decoding hidden vectors;
determining a prediction loss value based on the difference of the words in the text segment and the predicted values thereof and the distribution difference determined based on the Gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value.
In one embodiment, the step of replacing the first named entity in the first sample sequence with a first predetermined character includes:
randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.
In one embodiment, the step of determining the text segment containing the first preset character from the second sample sequence includes:
determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.
In one embodiment, the step of determining a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:
determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.
In one embodiment, the step of constructing a gaussian distribution based on the characterization vectors, and determining a global hidden vector for the text segment based on the gaussian distribution includes:
determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.
In one embodiment, the step of recursively determining decoded hidden vectors for the participles in the text segment comprises:
and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.
In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; said recursively determining hidden vectors for a plurality of participles in said second sequence of samples; determining a token vector for the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, including:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;
the step of constructing a gaussian distribution based on the characterization vectors, and determining a global hidden vector for the text segment based on the gaussian distribution, comprises:
constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;
the step of recursively determining a decoding hidden vector of a participle in the text segment and determining a predicted value of the participle in the text segment based on the decoding hidden vector comprises:
the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;
the step of determining a predicted loss value comprises:
determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
In one embodiment, the first recurrent neural network comprises a recurrent neural network RNN or a long-short term memory LSTM.
In a second aspect, an embodiment provides a method for identifying a named entity using a model, which is performed by a computer and includes:
acquiring a first word segmentation sequence to be identified, wherein the first word segmentation sequence comprises a plurality of word segmentations, and the plurality of word segmentations comprise named entities and non-named entities;
inputting the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;
determining the distribution probability of each participle of the first participle sequence on a plurality of preset labels based on the implicit vector of each participle of the first participle sequence;
and determining a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.
In a third aspect, an embodiment provides a model training apparatus for identifying a named entity, deployed in a computer, including:
a first obtaining module configured to obtain a first sample sequence including a plurality of participles, the plurality of participles including a named entity and a non-named entity;
a first replacing module, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence;
a first determining module, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;
a first construction module configured to construct, by a variational auto-encoder, a gaussian distribution based on the characterization vector, determine a global latent vector for the text segment based on the gaussian distribution;
a second determining module, configured to recursively determine, by using the first recurrent neural network, a decoding hidden vector of a participle in the text segment with the global hidden vector as an initial hidden vector, and determine a predicted value of the participle in the text segment based on the decoding hidden vector;
a first updating module configured to determine a prediction loss value based on a difference between a word in the text segment and a predicted value thereof and a distribution difference determined based on the gaussian distribution, and update the first recurrent neural network and the variational self-encoder in a direction of reducing the prediction loss value.
In one embodiment, the first replacing module, when replacing the first named entity in the first sample sequence with a first preset character, includes:
randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.
In one embodiment, the first replacement module, when determining the text segment containing the first preset character from the second sample sequence, includes:
determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.
In one embodiment, the determining, by the first determining module, a feature vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:
determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.
In an embodiment, the first building block is specifically configured to:
determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.
In one embodiment, the second determining module, when recursively determining the decoding hidden vectors of the participles in the text segment, includes:
and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.
In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; the first determining module is specifically configured to:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;
the first building module is specifically configured to:
constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;
the second determining module is specifically configured to:
the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;
the first updating module, when determining the predicted loss value, includes:
determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
In one embodiment, the first recurrent neural network comprises a recurrent neural network RNN or a long-short term memory LSTM.
In a fourth aspect, an embodiment provides an apparatus for named entity identification using a model, deployed in a computer, comprising:
the second acquisition module is configured to acquire a first segmentation sequence to be identified, wherein the first segmentation sequence comprises a plurality of segmentations, and the plurality of segmentations comprise named entities and non-named entities;
the first input module is configured to input the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;
a third determining module, configured to determine, based on the hidden vector of each participle of the first participle sequence, a distribution probability of each participle of the first participle sequence on a plurality of preset labels;
and the fourth determining module is configured to determine a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.
In a fifth aspect, embodiments provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of the first to second aspects.
In a sixth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first aspect to the second aspect.
The method and the device provided by the embodiment of the present specification may assume that a first named entity in a first sample sequence is a rare entity, the first named entity is masked by replacing the first named entity with a first preset character, then the text segment is encoded through a first recurrent neural network, a global hidden vector of the text segment is reconstructed through a variational self-encoder, then a decoding hidden vector of a participle in the text segment is reconstructed based on the global hidden vector, that is, the text segment is decoded based on the global hidden vector, and a predicted value of the participle in the text segment is determined based on the decoding hidden vector. When the first recurrent neural network and the variational self-coder are trained well, even if the named entity is shielded, the first recurrent neural network can well represent each participle by adopting a hidden vector based on the context of the shielded named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 is a schematic flowchart of a model training method for identifying a named entity according to an embodiment;
FIG. 3 is an exemplary diagram of a text passage;
FIG. 4 is an exemplary diagram of a training implementation provided by an embodiment;
FIG. 5 is a flowchart illustrating a method for identifying a named entity using a model, according to an embodiment;
FIG. 6 is a schematic block diagram of a model training apparatus for identifying named entities, provided by an embodiment;
FIG. 7 is a schematic block diagram of an apparatus for named entity identification using a model, according to an embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. Wherein the word segmentation sequence containing a plurality of word segmentations
Figure DEST_PATH_IMAGE002
Inputting into a first recurrent neural network, the first recurrent neural network outputting a hidden vector for each participle
Figure DEST_PATH_IMAGE004
Based on each hidden vector, the distribution probability of each participle in each classification can be determined, and based on the distribution probabilities, each participle is obtainedThe classification result of each word segmentation, i.e. the label of which classification each word segmentation corresponds to. The classification may be represented by a label. SOS is the beginning symbol of the word segmentation sequence, and EOS is the ending symbol of the word segmentation sequence.
Named entities (entities), which may also be referred to as Entity words, have noun properties in part of speech, including names of people, names of organizations, names of places, and all other Entity categories identified by names, and more generally, named entities also include categories such as numbers, dates, currencies, addresses, and the like.
For a word segmentation sequence containing a plurality of word segmentations, each word segmentation can be labeled in advance according to a defined label. Table 1 below is a meaning corresponding to the label defined in one example.
TABLE 1
Label (R) Means of Label (R) Means of Label (R) Means of
n Common noun f Noun of orientation s Term of wording
nw Name of work PER Name of a person LOC Place name
ORG Organization name TIME Time of day O Others
In table 1, "O" represents other meaning, referring to other words than the entity noun, such as verb, preposition, adjective, adverb, conjunctive, and the like. Multiple tags, from n to TIME, are a refined classification of entity nouns. The above classifications are merely examples provided for ease of understanding and are not intended to limit the present application.
The word segmentation sequence may be a sequence obtained after segmenting a text sequence. For example, when segmenting an english text sequence, each word or symbol is a segment; when the Chinese text sequence is segmented, the segmentation can be performed based on a preset segmentation dictionary. For example, for the English text sequence "List flights to Indianapolis with fast Monday mountain, please", each word and comma therein can be used as a participle. After a participle for the chinese text sequence "please list flights to indianapolis on monday morning and provide fares", a participle sequence "please-list-flights to monday-morning-to indianapolis-, -and-provide fares" can be obtained. The present specification does not limit the specific form of the segmentation sequence.
In order to more effectively and accurately determine the classification corresponding to each participle in the participle sequence, the first recurrent neural network can be trained by adopting a training set. The training set may include a large number of sample sequences and labels corresponding to each participle in the sample sequences. When the training of the first recurrent neural network is complete, it may be tested using a test set to determine model performance. The test set also contains a large number of word sequences and labels for each word. Due to the diversity of natural languages, the test set may contain rare entities, such as out-of-set words and low frequency words. The out-of-set word refers to a named entity that appears in the test set but does not appear in the training set. The low frequency words refer to named entities that appear in the test set, but appear less frequently in the training set. Models trained using the above training set do not classify these rare entities well and correctly and efficiently.
In order to train to obtain a model with better effectiveness and higher accuracy and enable the model to better classify and recognize rare entities when the model faces the rare entities, the embodiment of the specification provides a model training method. In the method, for each sample sequence in a training set, for example, referred to as a first sample sequence, a certain named entity in the first sample sequence, for example, the first named entity, is replaced with a certain preset character, for example, the first preset character, so as to obtain a second sample sequence. Determining a text segment containing a first preset character from a second sample sequence, determining a characterization vector of the text segment by adopting a first recurrent neural network, then constructing Gaussian distribution based on the characterization vector through a variational self-encoder, determining a global hidden vector of the text segment based on the Gaussian distribution, recursively determining a decoding hidden vector of participles in the text segment by adopting the global hidden vector as an initial hidden vector, determining a predicted value of the participles in the text segment based on the decoding hidden vector, determining a prediction loss value based on the difference between the participles in the text segment and the predicted value thereof and the distribution difference determined based on the Gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value. When the first recurrent neural network and the variational self-coder are trained well, even if the named entity is shielded by adopting the preset characters, the first recurrent neural network can well represent each participle by adopting the hidden vector based on the context of the shielded named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.
The following describes the examples of the present specification in detail.
Fig. 2 is a flowchart illustrating a model training method for recognizing a named entity according to an embodiment. The method may be performed by a computer. The computer may be implemented by any device, platform, or cluster of devices having computing, processing capabilities. The method includes the following steps S210 to S260.
Step S210, a first sample sequence containing a plurality of participles is obtained, and the participles contain named entities and non-named entities. The first sample sequence may be any one of sample sequences obtained from a training set, where the training set includes a label for each participle of the first sample sequence, and the label may indicate, for example, whether the participle belongs to a named entity, and to which kind of named entity the participle belongs. The sample sequence has the same structure as the above-mentioned word segmentation sequence, that is, the first sample sequence contains a plurality of words, and the first sample sequence is obtained by performing word segmentation on the text sequence. For example, the first sample sequence can be the English text sequence "List flights to Indianapolis with fares on Monday morning," wherein "List, flights, to, with fares, on, Please" can belong to a non-named entity and "Indianapolis, Monday, morning" belongs to a named entity. The first sample sequence may be
Figure 429135DEST_PATH_IMAGE002
It is shown that,
Figure DEST_PATH_IMAGE006
is the Nth participle in the first sample sequence X, and N is an integer.
Step S220, replacing the first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text fragment containing the first preset character from the second sample sequence.
In this step, a first number of named entities may be randomly determined from at least one named entity in the first sample sequence, and the first named entities are replaced with a first preset character as first named entities. The first number may be a preset number smaller than the total number of the part words N of the first sample sequence, for example, a value of 1 or 2. The first predetermined character may be "[ UNK ]", for example.
In one embodiment, a corresponding random number may be generated for each participle in the first sequence of samples
Figure DEST_PATH_IMAGE008
Figure 611854DEST_PATH_IMAGE008
Taking a value between 0 and 1 when the random number is
Figure 188329DEST_PATH_IMAGE008
Greater than a predetermined threshold p, and the random number
Figure 849118DEST_PATH_IMAGE008
When the corresponding word is named entity, the random number is added
Figure 208030DEST_PATH_IMAGE008
The corresponding named entity serves as the first named entity. Expressed by a formula as
Figure DEST_PATH_IMAGE010
(1)
Wherein,
Figure DEST_PATH_IMAGE012
representing participles
Figure DEST_PATH_IMAGE014
Is marked with a label
Figure DEST_PATH_IMAGE016
Not otherwise classified, i.e.
Figure 827230DEST_PATH_IMAGE014
Is a named entity class. After each participle in the first sample sequence is processed according to formula (1), a second sample sequence is obtained
Figure DEST_PATH_IMAGE018
In step S220, when the text segment containing the first preset character is determined from the second sample sequence, a sequence starting from the first preset character and ending with a first named entity after the first preset character in the second sample sequence may be determined as the text segment. Or determining a sequence which ends with a first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text segment.
For example, FIG. 3 is an exemplary diagram of a text fragment. In the first sample sequence, "Indianapolis" is replaced with "[ UNK ]", the sequence from sequence number j to sequence number k may be determined as a text fragment, and k is the first named entity after j. The second row in fig. 3 shows the label of each participle in the first sample sequence, and the label meaning can be seen in table 1.
Step S230, using the first recurrent neural network, using the preset hidden vector as an initial hidden vector, recursively determining hidden vectors of a plurality of participles in the second sample sequence, and determining a characterization vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence. In this step, a preset implicit vector is input into the first recurrent neural network as an initial implicit vector, and the first recurrent neural network outputs a determined implicit vector of a participle in the second sample sequence.
The first Recurrent neural network may include a Recurrent Neural Network (RNN) or a Long Short-Term Memory (LSTM).
When determining the hidden vectors of a plurality of participles in the second sample sequence, determining the hidden vector of the first participle based on the initial hidden vector aiming at the first participle in the second sample sequence, and determining the hidden vector of the participle based on the hidden vector of the previous participle aiming at each participle after the first participle in the second sample sequence. This is to recursively determine the hidden vector of each participle, and the hidden vector of the next participle contains the information of each preceding participle. The hidden vector is a vector for representing the feature of the word segmentation. When determining the implicit vector of the participle based on the initial implicit vector or the implicit vector of the last participle, an f-function in the first recurrent neural network can be adopted, wherein the f-function contains parameters to be updated. For example, the hidden vector for each participle may be determined using the following equation (2):
Figure DEST_PATH_IMAGE020
(2)
wherein,
Figure DEST_PATH_IMAGE022
a hidden vector for the ith participle of the second sample sequence,
Figure DEST_PATH_IMAGE024
is a hidden vector of the i-1 th participle,
Figure DEST_PATH_IMAGE026
is an initial hidden vector. In this step, the initial hidden vector may be a preset hidden vector, and the preset hidden vector may be a randomly generated hidden vector or a certain hidden vector that is fixedly set.
The process of recursively determining the hidden vectors of the multiple participles in the second sample sequence by using the first recurrent neural network and using the preset hidden vector as the initial hidden vector can be understood as a process of encoding the participles by using the first recurrent neural network.
In step S230, when determining the token vector of the text segment based on the hidden vectors of the multiple participles in the second sample sequence, an initial hidden vector of a head participle and an initial hidden vector of a tail participle of the text segment may be determined from the hidden vectors of the multiple participles in the second sample sequence, and the token vector of the text segment may be determined based on a difference between the initial hidden vector of the tail participle and the initial hidden vector of the head participle.
The text segment is a segment intercepted from the second sample sequence, so that the implicit vector of the first word of the text segment can be determined from the second sample sequence and is used as the initial implicit vector of the first word; and determining the implicit vector of the tail word of the text segment from the second sample sequence as the initial implicit vector of the tail word. For example, referring to the example of fig. 3, the hidden vector of the jth participle and the hidden vector of the kth participle in the second sample sequence are respectively used as the initial hidden vector of the head participle and the initial hidden vector of the tail participle of the text segment.
When determining the representation vector of the text segment, the difference value between the initial hidden vector of the tail word and the initial hidden vector of the head word can be directly used as the representation vector, and the result of the difference value after predetermined transformation can also be used as the representation vector. For example, the predetermined transformation may include multiplying the difference by a certain coefficient.
In one example, the token vector of a text segment can be determined using the following equation (3):
Figure DEST_PATH_IMAGE028
(3)
wherein,
Figure DEST_PATH_IMAGE030
a token vector representing a segment of text from j to k,
Figure DEST_PATH_IMAGE032
is the initial hidden vector of the tail word,
Figure DEST_PATH_IMAGE034
is an initial hidden vector of the first-part word. j and k are both integers.
And step S240, constructing Gaussian distribution based on the characterization vectors through a variational self-encoder, and determining a global hidden vector aiming at the text segment based on the Gaussian distribution. A Variational Auto-Encoder (VAE) may be understood as a device in which a continuous gaussian distribution exists, and when a set of mean and variance is input into the Variational Auto-Encoder, the Variational Auto-Encoder may determine the gaussian distribution corresponding to the mean and variance, and obtain a corresponding vector based on sampling from the gaussian distribution.
In the embodiment, by the variational self-encoder, based on the characterization vector, the mean and variance of the gaussian distribution are determined, and based on the mean and variance of the gaussian distribution, the global latent vector for the text segment is determined. Specifically, the characterization vector is input into the VAE, the VAE may output a corresponding mean and variance according to the parameters thereof, when the mean and variance are determined, that is, a gaussian distribution corresponding to the mean and variance is determined, the VAE may sample from the gaussian distribution, and a vector obtained by sampling is used as an implicit vector of the text segment.
For example, the VAE may determine the mean and variance by the following equation (4):
Figure DEST_PATH_IMAGE036
(4)
wherein,
Figure 313838DEST_PATH_IMAGE030
the token vectors for the text segments j through k,
Figure DEST_PATH_IMAGE038
is the mean value of the token vector and,
Figure DEST_PATH_IMAGE040
for the variance of the token vector to be,
Figure DEST_PATH_IMAGE042
Figure DEST_PATH_IMAGE044
Figure DEST_PATH_IMAGE046
and
Figure DEST_PATH_IMAGE048
for a parameter to be updated in the VAE, the parameter may take an initial value in an initial iteration and be continuously updated in subsequent iterations. VAE can be based onMean of text segments
Figure 797558DEST_PATH_IMAGE038
Sum variance
Figure 997595DEST_PATH_IMAGE040
Determining global latent vectors for text segments
Figure DEST_PATH_IMAGE050
Step S250, a first recurrent neural network is adopted, the global hidden vector is used as an initial hidden vector, decoding hidden vectors of the participles in the text segment are determined recursively, and the predicted values of the participles in the text segment are determined based on the decoding hidden vectors. In this step, the global hidden vector is input to the first recurrent neural network as an initial hidden vector, and the first recurrent neural network outputs a decoded hidden vector of the participle in the determined text segment. In this step, determining the decoding hidden vector of the participle in the text segment based on the global hidden vector may be understood as reconstructing the participle in the text segment based on the global hidden vector, which corresponds to a process of decoding the participle, and specifically may decode and reconstruct the participle in the middle thereof.
In one embodiment, for each intermediate participle in the text segment except for the first participle and the last participle, a decoding hidden vector of the intermediate participle may be determined based on a decoding hidden vector of a previous participle through the first recurrent neural network, where the decoding hidden vector of the previous participle of the first intermediate participle is a global hidden vector. The number of intermediate participles may be one or more. This process can be expressed using the following equation (5):
Figure DEST_PATH_IMAGE052
(5)
wherein,
Figure DEST_PATH_IMAGE054
for the mth intermediate participle in the text segment
Figure DEST_PATH_IMAGE056
The decoded hidden vector of (1) is,
Figure DEST_PATH_IMAGE058
the decoded hidden vector for the m-1 th intermediate participle,
Figure DEST_PATH_IMAGE060
taking the value as a global latent vector
Figure 584435DEST_PATH_IMAGE050
. m is in [ j +1, j +2, …, k-1]And taking values between the text segments, wherein the text segments are formed by word segments from j to k.
In determining the predicted value of the segmented word in the text segment based on the decoded hidden vector, it can be understood that the probability that the segmented word in the text segment is a word in a preset dictionary is determined, that is, what word the segmented word is predicted. Specifically, the prediction value of the word segmentation in the text segment can be calculated by adopting a softmax function, that is, by adopting formula (6):
Figure DEST_PATH_IMAGE062
(6)
wherein,
Figure DEST_PATH_IMAGE064
the predicted value of the mth intermediate participle in the text segment can be probability distribution;
Figure 143024DEST_PATH_IMAGE054
a decoded hidden vector for the mth intermediate participle.
Figure DEST_PATH_IMAGE066
Is the parameter to be updated in the iterative process.
Step S260, determining a prediction loss value based on the difference between the word in the text segment and the predicted value thereof and the distribution difference determined based on the gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value, that is, updating the parameters therein. In addition to updating the parameters of the first recurrent neural network and the variational self-encoder, the parameters to be updated mentioned in the above steps may also be updated.
The participles in the text segments are standard values, the standard values can be represented by probability distribution, the predicted values can also be represented by probability distribution, and the difference between the standard values and the predicted values can be calculated by adopting cross entropy. The participles in the text segment may be intermediate participles, and when the intermediate participles are plural, the cross entropy of each participle may be summed, and the prediction loss value may be calculated based on the sum value.
Since the parameters of the VAE may also be learned during the model training process, a corresponding KL (Kullback-Leibler) divergence may be determined based on the gaussian distribution, as a distribution difference, which represents a difference between the initial distribution of the VAE and its predicted gaussian distribution. In determining the prediction loss value, a sum value determination may be made based on a difference between a word in a text segment and its predicted value and a difference between distributions determined based on a gaussian distribution. For example, the predicted loss value can be expressed by the following formula (7):
Figure DEST_PATH_IMAGE068
(7)
wherein,
Figure DEST_PATH_IMAGE070
Figure DEST_PATH_IMAGE072
for cross entropy calculated based on the difference of a word in a text segment from its predicted value,
Figure DEST_PATH_IMAGE074
to determine the KL divergence based on the gaussian distribution,
Figure DEST_PATH_IMAGE076
is the negative value of the determined predicted loss value.
The steps S210 to S260 are an iterative process. This iterative process may be repeated multiple times until the iterative process converges. The convergence condition may include that the number of iterations is greater than a preset number threshold, or that the predicted loss value is less than a certain threshold, etc. In the iterative process shown in steps S210 to S260, the iterative process is described based on one first sample sequence, and in another embodiment, the plurality of first sample sequences may be processed according to the process included in steps S210 to S260, the total prediction loss value corresponding to the plurality of first sample sequences may be determined, and the first recurrent neural network and the variational self-encoder may be updated in a direction of reducing the total prediction loss value. Therefore, the times of updating the model parameters can be reduced, and the training efficiency is improved.
In another embodiment of the present specification, the first recurrent neural network may include a bidirectional recurrent neural network, such as a bidirectional RNN, or a bidirectional LSTM. In this embodiment, the execution processes represented by the above formulas (2) to (7) are executed once in the forward order of the sequence and once in the backward order of the sequence, the obtained prediction loss values are summed, and the first recurrent neural network and the variational self-encoder are updated in the direction of decreasing the summed prediction loss values. The sample characteristics extracted by the bidirectional training neural network are richer, so that the training process aiming at the sample sequence is more sufficient, and the model is more effectively and accurately trained. Specific embodiments are as follows.
In step S230, recursively determining hidden vectors of the plurality of participles in the second sample sequence, and determining a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, the steps include:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in a second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; a first token vector of the text segment is determined based on the plurality of first hidden vectors, and a second token vector of the text segment is determined based on the plurality of second hidden vectors.
In step S240, a gaussian distribution is constructed based on the characterization vectors, and a global hidden vector for the text segment is determined based on the gaussian distribution, including:
the method comprises the steps of constructing a first Gaussian distribution based on a first characterization vector through a variation self-encoder, determining a first global hidden vector aiming at a text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on a second characterization vector, and determining a second global hidden vector aiming at the text segment based on the second Gaussian distribution.
In step S250, the step of recursively determining a decoding hidden vector of the participle in the text segment, and determining a predicted value of the participle in the text segment based on the decoding hidden vector comprises:
a first recursive neural network is adopted, a global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in a text segment is recursively determined according to the forward sequence of a sequence, and a second decoding hidden vector of each intermediate participle in the text segment is recursively determined according to the backward sequence of the sequence; and determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector. The middle participles are participles except the first participle and the tail participle in the text segment.
In step S260, the step of determining the predicted loss value may include:
determining a first loss value based on the difference of each participle in the text segment and the first predicted value thereof and the first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference between each participle in the text segment and a second predicted value of the participle and a second distribution difference determined based on a second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
For the second sample sequence shown in fig. 3, the forward order of the sequence is the process from "List" to "please", and the backward order of the sequence is the process from "please" to "List". For the text fragment in FIG. 3, the forward order of the sequence is the process from "Indianapolis" to "Monday" and the backward order of the sequence is the process from "Monday" to "Indianapolis". The implementation of the forward training process and the backward training process can be performed according to the execution processes represented by the above formulas (2) - (7), and will not be described herein again.
For the first recurrent neural network, parameters in the forward process and the backward process are shared, and the forward process and the backward process are different in the sequence order.
The training targets for the two-way training process may be represented as
Figure DEST_PATH_IMAGE078
(8)
Wherein,
Figure DEST_PATH_IMAGE080
a first loss value determined for the forward training process,
Figure DEST_PATH_IMAGE082
a second loss value determined for the backward training process,
Figure DEST_PATH_IMAGE084
the value of the predicted loss is represented,
Figure DEST_PATH_IMAGE086
representing the parameters to be updated of the first recurrent neural network,
Figure DEST_PATH_IMAGE088
showing other parameters to be updated, and X is the first sample sequence.
Fig. 4 is an exemplary diagram of a training execution process according to an embodiment. In this embodiment, the text segment is "Indianapolis with fares on Monday" where "Indianapolis" is masked as the first preset symbol "[ UNK]". Performing local context reconstruction on the text segment, determining a mean value mu and a logarithmic variance log sigma based on the characterization vectors of the text segment, determining a global hidden vector of the text segment based on Gaussian distribution corresponding to the mean value mu and the logarithmic variance log sigma, and taking the global hidden vector as an initial hidden vector of a middle participle of' with fares on
Figure 394883DEST_PATH_IMAGE060
Separately determining the decoding hidden vector of each intermediate participle
Figure DEST_PATH_IMAGE090
~
Figure DEST_PATH_IMAGE092
. Based on the decoded implicit vector, the processes of word segmentation prediction, loss value determination and the like can be performed.
When the training of the first recurrent neural network is completed based on the above embodiment, named entities in the segmentation sequence are identified based on the first recurrent neural network, and even if there are extra-cluster words that do not appear in the training set or low-frequency words in the segmentation sequence, the first recurrent neural network can determine a hidden vector that is superstrong in representation for each segmentation based on the segmentation sequence, and when the hidden vector can well represent sequence features, the label of the segmentation can be more accurately determined.
Fig. 5 is a flowchart illustrating a method for identifying a named entity using a model according to an embodiment. The method is performed by a computer, which may be implemented by any device, platform, or cluster of devices having computing, processing, capabilities. The method includes the following steps S510 to S540.
Step S510, a first segmentation sequence to be identified, which includes a plurality of segmentations, which includes a named entity and a non-named entity, is obtained. The first segmentation sequence may be any one of the segmentation sequences in the test set, or may be a segmentation sequence obtained in another manner.
Step S520, inputting the first word segmentation sequence into the trained first recurrent neural network, and obtaining hidden vectors of a plurality of words in the first word segmentation sequence. The first recurrent neural network is trained using the method shown in fig. 2.
The first recurrent neural network may also employ a bidirectional recurrent neural network, such as a bidirectional RNN, or a bidirectional LSTM. In this embodiment, the first word segmentation sequence may be input into a trained first recurrent neural network, so as to obtain forward hidden vectors of a plurality of words of the first word segmentation sequence determined according to a forward order of the sequence, and backward hidden vectors of a plurality of words of the first word segmentation sequence determined according to a backward order of the sequence, and for each word segmentation, the forward hidden vectors and the backward hidden vectors are vector-spliced, so as to obtain a hidden vector of the word segmentation.
Step S530, determining a distribution probability of each participle of the first participle sequence on a plurality of preset labels based on the hidden vector of each participle of the first participle sequence. In this step, a Conditional Random Field (CRF) may be used to determine the distribution probability of each participle of the first participle sequence on a plurality of preset tags. The parameters in the CRF may be trained in advance according to a training set. Specifically, the hidden vector of each participle of the first participle sequence may be input into the CRF to obtain the distribution probability of each participle of the first participle sequence on the plurality of preset tags. The plurality of preset tags may be, for example, a plurality of tags as shown in table 1.
The parameters in the CRF may be trained in advance according to a training set. During training, parameters in the first recurrent neural network may be kept unchanged, while loss values are obtained from the labels in the training set, and the parameters of the CRF are adjusted in a direction to reduce the loss values, for example.
Step S540, determining a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence. Specifically, the preset label corresponding to the maximum probability value in the distribution probabilities may be determined as the preset label corresponding to the word segmentation, that is, the classification result.
When the embodiment shown in fig. 2 is used to train the first recurrent neural network, even if an untrained named entity is encountered in the segmentation sequence, the first recurrent neural network can well represent each segmentation by using a hidden vector based on the context of the named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.
The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
FIG. 6 is a schematic block diagram of a model training apparatus for identifying a named entity according to an embodiment. The apparatus 600 is deployed in a computer. This embodiment corresponds to the embodiment of the method shown in fig. 2. The apparatus 600 comprises:
a first obtaining module 610 configured to obtain a first sample sequence including a plurality of participles, the plurality of participles including a named entity and a non-named entity;
a first replacing module 620, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence;
a first determining module 630, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;
a first constructing module 640 configured to construct, by a variational auto-encoder, a gaussian distribution based on the characterization vector, determine a global latent vector for the text segment based on the gaussian distribution;
a second determining module 650, configured to recursively determine, by using the first recurrent neural network, a decoding hidden vector of a participle in the text segment with the global hidden vector as an initial hidden vector, and determine a predicted value of the participle in the text segment based on the decoding hidden vector;
a first updating module 660 configured to determine a prediction loss value based on a difference between a word in the text segment and a predicted value thereof and a distribution difference determined based on the gaussian distribution, and update the first recurrent neural network and the variational self-encoder in a direction of reducing the prediction loss value.
In one embodiment, the first replacing module 620, when replacing the first named entity in the first sample sequence with the first preset character, includes:
randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.
In one embodiment, the first replacing module 620, when determining the text segment containing the first preset character from the second sample sequence, includes:
determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.
In one embodiment, the determining, by the first determining module 630, a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:
determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.
In an embodiment, the first building module 640 is specifically configured to:
determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.
In one embodiment, the second determining module 650, when recursively determining the decoding hidden vectors of the participles in the text segment, includes:
and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.
In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; the first determining module 630 is specifically configured to:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;
the first building block 640 is specifically configured to:
constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;
the second determining module 650 is specifically configured to:
the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;
the first updating module 660, when determining the predicted loss value, includes:
determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
In one embodiment, the first recurrent neural network comprises a recurrent neural network RNN or a long-short term memory LSTM.
FIG. 7 is a schematic block diagram of an apparatus for named entity identification using a model, according to an embodiment. The apparatus 700 is deployed in a computer. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 5. The apparatus 700 comprises:
a second obtaining module 710 configured to obtain a first segmentation sequence to be identified, where the first segmentation sequence includes a plurality of segmentations, and the plurality of segmentations include a named entity and a non-named entity;
a first input module 720, configured to input the first word segmentation sequence into a trained first recurrent neural network, so as to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by adopting the method shown in FIG. 2;
a third determining module 730, configured to determine, based on the hidden vector of each participle of the first participle sequence, a distribution probability of each participle of the first participle sequence on a plurality of preset labels;
the fourth determining module 740 is configured to determine a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.
The above device embodiments correspond to the method embodiments, and specific descriptions may refer to descriptions of the method embodiments, which are not repeated herein. The device embodiment is obtained based on the corresponding method embodiment, has the same technical effect as the corresponding method embodiment, and for the specific description, reference may be made to the corresponding method embodiment.
Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 5.
The present specification also provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 5.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and is not intended to limit the scope of the present invention, and any modification, equivalent replacement, or improvement made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (20)

1. A model training method for identifying named entities, performed by a computer, comprising:
obtaining a first sample sequence containing a plurality of participles, wherein the participles comprise named entities and non-named entities;
replacing a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence;
recursively determining hidden vectors of a plurality of participles in the second sample sequence by adopting a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;
constructing, by a variational auto-encoder, a Gaussian distribution based on the characterization vectors, determining a global latent vector for the text segment based on the Gaussian distribution;
recursively determining decoding hidden vectors of the participles in the text segment by adopting the first recurrent neural network and taking the global hidden vector as an initial hidden vector, and determining a predicted value of the participles in the text segment based on the decoding hidden vectors;
determining a prediction loss value based on the difference of the words in the text segment and the predicted values thereof and the distribution difference determined based on the Gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value.
2. The method of claim 1, the step of replacing the first named entity in the first sequence of samples with a first preset character, comprising:
randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.
3. The method of claim 1, the step of determining a text segment containing the first preset character from the second sample sequence comprising:
determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.
4. The method of claim 3, the step of determining a characterization vector for the text segment based on hidden vectors for a plurality of participles in the second sample sequence, comprising:
determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.
5. The method of claim 1, the step of constructing a gaussian distribution based on the characterization vectors, determining a global hidden vector for the text segment based on the gaussian distribution, comprising:
determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.
6. The method of claim 1, the step of recursively determining decoded latent vectors for the participles in the text segment, comprising:
and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.
7. The method of claim 1, the first recurrent neural network comprising a bidirectional recurrent neural network; said recursively determining hidden vectors for a plurality of participles in said second sequence of samples; determining a token vector for the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, including:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;
the step of constructing a gaussian distribution based on the characterization vectors, and determining a global hidden vector for the text segment based on the gaussian distribution, comprises:
constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;
the step of recursively determining a decoding hidden vector of a participle in the text segment and determining a predicted value of the participle in the text segment based on the decoding hidden vector comprises:
the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;
the step of determining a predicted loss value comprises:
determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
8. The method of claim 1, the first recurrent neural network comprising a Recurrent Neural Network (RNN) or a long-short term memory (LSTM).
9. A method for named entity recognition using a model, performed by a computer, comprising:
acquiring a first word segmentation sequence to be identified, wherein the first word segmentation sequence comprises a plurality of word segmentations, and the plurality of word segmentations comprise named entities and non-named entities;
inputting the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;
determining the distribution probability of each participle of the first participle sequence on a plurality of preset labels based on the implicit vector of each participle of the first participle sequence;
and determining a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.
10. A model training apparatus for identifying named entities, deployed in a computer, comprising:
a first obtaining module configured to obtain a first sample sequence including a plurality of participles, the plurality of participles including a named entity and a non-named entity;
a first replacing module, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence;
a first determining module, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;
a first construction module configured to construct, by a variational auto-encoder, a gaussian distribution based on the characterization vector, determine a global latent vector for the text segment based on the gaussian distribution;
a second determining module, configured to recursively determine, by using the first recurrent neural network, a decoding hidden vector of a participle in the text segment with the global hidden vector as an initial hidden vector, and determine a predicted value of the participle in the text segment based on the decoding hidden vector;
a first updating module configured to determine a prediction loss value based on a difference between a word in the text segment and a predicted value thereof and a distribution difference determined based on the gaussian distribution, and update the first recurrent neural network and the variational self-encoder in a direction of reducing the prediction loss value.
11. The apparatus of claim 10, the first replacement module, when replacing the first named entity in the first sequence of samples with a first predetermined character, comprises:
randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.
12. The apparatus of claim 10, the first replacement module, when determining the text segment containing the first preset character from the second sample sequence, comprises:
determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.
13. The apparatus of claim 12, the first determining module, when determining the characterization vector for the text segment based on the hidden vectors for the plurality of participles in the second sample sequence, comprises:
determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.
14. The apparatus of claim 10, the first building block being specifically configured to:
determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.
15. The apparatus of claim 10, the second determining module, when recursively determining the decoding hidden vectors for the participles in the text segment, comprises:
and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.
16. The apparatus of claim 10, the first recurrent neural network comprising a bidirectional recurrent neural network; the first determining module is specifically configured to:
a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;
the first building module is specifically configured to:
constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;
the second determining module is specifically configured to:
the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;
the first updating module, when determining the predicted loss value, includes:
determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.
17. The apparatus of claim 10, the first recurrent neural network comprising a Recurrent Neural Network (RNN) or a long-short term memory (LSTM).
18. An apparatus for named entity recognition using a model, deployed in a computer, comprising:
the second acquisition module is configured to acquire a first segmentation sequence to be identified, wherein the first segmentation sequence comprises a plurality of segmentations, and the plurality of segmentations comprise named entities and non-named entities;
the first input module is configured to input the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;
a third determining module, configured to determine, based on the hidden vector of each participle of the first participle sequence, a distribution probability of each participle of the first participle sequence on a plurality of preset labels;
and the fourth determining module is configured to determine a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.
19. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-9.
20. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-9.
CN202010631307.7A 2020-07-03 2020-07-03 Model training and named entity recognition method and device Active CN111523313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010631307.7A CN111523313B (en) 2020-07-03 2020-07-03 Model training and named entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010631307.7A CN111523313B (en) 2020-07-03 2020-07-03 Model training and named entity recognition method and device

Publications (2)

Publication Number Publication Date
CN111523313A true CN111523313A (en) 2020-08-11
CN111523313B CN111523313B (en) 2020-09-29

Family

ID=71911753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010631307.7A Active CN111523313B (en) 2020-07-03 2020-07-03 Model training and named entity recognition method and device

Country Status (1)

Country Link
CN (1) CN111523313B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199950A (en) * 2020-10-16 2021-01-08 支付宝(杭州)信息技术有限公司 Network training method and device for event detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device
CN110532570A (en) * 2019-09-10 2019-12-03 杭州橙鹰数据技术有限公司 A kind of method and apparatus of method and apparatus and model training that naming Entity recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device
CN110532570A (en) * 2019-09-10 2019-12-03 杭州橙鹰数据技术有限公司 A kind of method and apparatus of method and apparatus and model training that naming Entity recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JASON P.C.CHIU等: "Named Entity Recognition with Bidirectional LSTM-CNNs", 《COMPUTATION AND LANGUAGE》 *
冯蕴天等: "基于深度信念网络的命名实体识别", 《计算机科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199950A (en) * 2020-10-16 2021-01-08 支付宝(杭州)信息技术有限公司 Network training method and device for event detection

Also Published As

Publication number Publication date
CN111523313B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
US20230351212A1 (en) Semi-supervised method and apparatus for public opinion text analysis
CN111523314B (en) Model confrontation training and named entity recognition method and device
CN107423444B (en) Hot word phrase extraction method and system
US20150095017A1 (en) System and method for learning word embeddings using neural language models
CN112215013B (en) Clone code semantic detection method based on deep learning
CN110750993A (en) Word segmentation method, word segmentation device, named entity identification method and system
CN112580346B (en) Event extraction method and device, computer equipment and storage medium
CN110569505A (en) text input method and device
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN112613293B (en) Digest generation method, digest generation device, electronic equipment and storage medium
CN111523313B (en) Model training and named entity recognition method and device
CN115457942A (en) End-to-end multi-language voice recognition method based on mixed expert model
CN113158667B (en) Event detection method based on entity relationship level attention mechanism
CN111753546B (en) Method, device, computer equipment and storage medium for extracting document information
CN113761845A (en) Text generation method and device, storage medium and electronic equipment
CN116933862A (en) Fine tuning method, device, equipment and storage medium of BERT model
CN116776884A (en) Data enhancement method and system for medical named entity recognition
CN116257601A (en) Illegal word stock construction method and system based on deep learning
CN115878847A (en) Video guide method, system, equipment and storage medium based on natural language
CN116186599A (en) Multi-label classification method and system for act text based on comparison learning and graph learning
CN114610576A (en) Log generation monitoring method and device
CN115292483A (en) Short text classification method based on uncertainty perception heterogeneous graph attention network
CN115080748A (en) Weak supervision text classification method and device based on noisy label learning
WO2021017953A1 (en) Dual monolingual cross-entropy-delta filtering of noisy parallel data
CN113934833A (en) Training data acquisition method, device and system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40035494

Country of ref document: HK