CN111523313B

CN111523313B - Model training and named entity recognition method and device

Info

Publication number: CN111523313B
Application number: CN202010631307.7A
Authority: CN
Inventors: 李扬名; 李小龙; 姚开盛
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-09-29
Anticipated expiration: 2040-07-03
Also published as: CN111523313A

Abstract

The embodiment of the specification provides a model training and named entity recognition method and device. During model training, replacing a first named entity in a first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence; recursively determining hidden vectors of a plurality of participles in the second sample sequence by adopting a first recurrent neural network, and determining a characterization vector of the text segment; constructing a Gaussian distribution based on the characterization vectors and determining a global hidden vector aiming at the text segment through a variational self-encoder; a first recurrent neural network is adopted, a global hidden vector is used as an initial hidden vector, decoding hidden vectors of participles in a text segment are determined recursively, and predicted values of the participles in the text segment are determined; and determining a prediction loss value based on the difference between the words in the text segment and the predicted value thereof and the distribution difference, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value.

Description

Model training and named entity recognition method and device

Technical Field

One or more embodiments of the present disclosure relate to the field of natural language processing technologies, and in particular, to a method and an apparatus for model training and named entity recognition.

Background

In the field of natural language processing technology, classifying named entities (entities) in text sequences is an important direction for research. Named entities have noun properties in part of speech, including names of people, names of organizations, names of places, and all other entity classes identified by names. The broader named entities also include categories of numbers, dates, currencies, addresses, and the like. The method and the device can accurately identify the category of the named entity, and can improve the accuracy and effectiveness of natural language processing.

Typically, models used to identify named entities are trained using a training set, and after training of the models is completed, the models are tested using a test set. One of the challenges in named entity recognition is the recognition of rare entities, extracorporal words and low frequency words. The out-of-set word refers to a named entity that appears in the test set but does not appear in the training set. The low frequency words refer to named entities that appear in the test set, but appear less frequently in the training set. The sparsity of the training data brings great challenges to model training.

Therefore, it is desirable to have an improved scheme that can train a more efficient and accurate model so that the model can better identify rare entities when faced with them.

Disclosure of Invention

One or more embodiments of the specification describe model training and named entity recognition methods and devices to train and obtain a model with better effectiveness and higher accuracy, so that the model can be better recognized when facing rare entities. The specific technical scheme is as follows.

In a first aspect, a model training method for identifying named entities is provided, which is executed by a computer and comprises the following steps:

obtaining a first sample sequence containing a plurality of participles, wherein the participles comprise named entities and non-named entities;

replacing a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence;

recursively determining hidden vectors of a plurality of participles in the second sample sequence by adopting a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;

constructing, by a variational auto-encoder, a Gaussian distribution based on the characterization vectors, determining a global latent vector for the text segment based on the Gaussian distribution;

recursively determining decoding hidden vectors of the participles in the text segment by adopting the first recurrent neural network and taking the global hidden vector as an initial hidden vector, and determining a predicted value of the participles in the text segment based on the decoding hidden vectors;

determining a prediction loss value based on the difference of the words in the text segment and the predicted values thereof and the distribution difference determined based on the Gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value.

In one embodiment, the step of replacing the first named entity in the first sample sequence with a first predetermined character includes:

randomly determining a first number of named entities from at least one named entity in the first sample sequence to serve as a first named entity, and replacing the first named entity with a first preset character.

In one embodiment, the step of determining the text segment containing the first preset character from the second sample sequence includes:

determining a sequence starting from the first preset character and ending with a first named entity behind the first preset character in the second sample sequence as a text segment; or determining a sequence which ends with the first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text fragment.

In one embodiment, the step of determining a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:

determining an initial implicit vector of a head participle and an initial implicit vector of a tail participle of the text segment from the implicit vectors of the participles of the second sample sequence, and determining a characterization vector of the text segment based on a difference value of the initial implicit vector of the tail participle and the initial implicit vector of the head participle.

In one embodiment, the step of constructing a gaussian distribution based on the characterization vectors, and determining a global hidden vector for the text segment based on the gaussian distribution includes:

determining, by a variational auto-encoder, a mean and a variance of a Gaussian distribution based on the characterization vector, and determining a global latent vector for the text segment based on the mean and the variance of the Gaussian distribution.

In one embodiment, the step of recursively determining decoded hidden vectors for the participles in the text segment comprises:

and determining a decoding hidden vector of the middle participle based on a decoding hidden vector of a previous participle aiming at each middle participle except the first participle and the last participle in the text segment through the first recurrent neural network, wherein the decoding hidden vector of the previous participle of the first middle participle is the global hidden vector.

In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; said recursively determining hidden vectors for a plurality of participles in said second sequence of samples; determining a token vector for the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, including:

a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; determining a first token vector of the text segment based on a plurality of the first hidden vectors, and determining a second token vector of the text segment based on a plurality of the second hidden vectors;

the step of constructing a gaussian distribution based on the characterization vectors, and determining a global hidden vector for the text segment based on the gaussian distribution, comprises:

constructing, by a variational self-encoder, a first Gaussian distribution based on the first characterization vector, determining a first global latent vector for the text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on the second characterization vector, determining a second global latent vector for the text segment based on the second Gaussian distribution;

the step of recursively determining a decoding hidden vector of a participle in the text segment and determining a predicted value of the participle in the text segment based on the decoding hidden vector comprises:

the first recursive neural network is adopted, the global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the forward sequence of the sequence, and a second decoding hidden vector of each intermediate participle in the text segment is determined recursively according to the backward sequence of the sequence; determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector; the middle participles are participles except the first participle and the tail participle in the text segment;

the step of determining a predicted loss value comprises:

determining a first loss value based on the difference of each participle in the text segment and a first predicted value thereof and a first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference of each participle in the text segment and a second predicted value thereof and a second distribution difference determined based on the second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.

In one embodiment, the first recurrent neural network comprises a recurrent neural network RNN or a long-short term memory LSTM.

In a second aspect, an embodiment provides a method for identifying a named entity using a model, which is performed by a computer and includes:

acquiring a first word segmentation sequence to be identified, wherein the first word segmentation sequence comprises a plurality of word segmentations, and the plurality of word segmentations comprise named entities and non-named entities;

inputting the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;

determining the distribution probability of each participle of the first participle sequence on a plurality of preset labels based on the implicit vector of each participle of the first participle sequence;

and determining a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.

In a third aspect, an embodiment provides a model training apparatus for identifying a named entity, deployed in a computer, including:

a first obtaining module configured to obtain a first sample sequence including a plurality of participles, the plurality of participles including a named entity and a non-named entity;

a first replacing module, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence;

a first determining module, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;

a first construction module configured to construct, by a variational auto-encoder, a gaussian distribution based on the characterization vector, determine a global latent vector for the text segment based on the gaussian distribution;

a second determining module, configured to recursively determine, by using the first recurrent neural network, a decoding hidden vector of a participle in the text segment with the global hidden vector as an initial hidden vector, and determine a predicted value of the participle in the text segment based on the decoding hidden vector;

a first updating module configured to determine a prediction loss value based on a difference between a word in the text segment and a predicted value thereof and a distribution difference determined based on the gaussian distribution, and update the first recurrent neural network and the variational self-encoder in a direction of reducing the prediction loss value.

In one embodiment, the first replacing module, when replacing the first named entity in the first sample sequence with a first preset character, includes:

In one embodiment, the first replacement module, when determining the text segment containing the first preset character from the second sample sequence, includes:

In one embodiment, the determining, by the first determining module, a feature vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:

In an embodiment, the first building block is specifically configured to:

In one embodiment, the second determining module, when recursively determining the decoding hidden vectors of the participles in the text segment, includes:

In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; the first determining module is specifically configured to:

the first building module is specifically configured to:

the second determining module is specifically configured to:

the first updating module, when determining the predicted loss value, includes:

In a fourth aspect, an embodiment provides an apparatus for named entity identification using a model, deployed in a computer, comprising:

the second acquisition module is configured to acquire a first segmentation sequence to be identified, wherein the first segmentation sequence comprises a plurality of segmentations, and the plurality of segmentations comprise named entities and non-named entities;

the first input module is configured to input the first word segmentation sequence into a trained first recurrent neural network to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by the method of claim 1;

a third determining module, configured to determine, based on the hidden vector of each participle of the first participle sequence, a distribution probability of each participle of the first participle sequence on a plurality of preset labels;

and the fourth determining module is configured to determine a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.

In a fifth aspect, embodiments provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of the first to second aspects.

In a sixth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first aspect to the second aspect.

The method and the device provided by the embodiment of the present specification may assume that a first named entity in a first sample sequence is a rare entity, the first named entity is masked by replacing the first named entity with a first preset character, then the text segment is encoded through a first recurrent neural network, a global hidden vector of the text segment is reconstructed through a variational self-encoder, then a decoding hidden vector of a participle in the text segment is reconstructed based on the global hidden vector, that is, the text segment is decoded based on the global hidden vector, and a predicted value of the participle in the text segment is determined based on the decoding hidden vector. When the first recurrent neural network and the variational self-coder are trained well, even if the named entity is shielded, the first recurrent neural network can well represent each participle by adopting a hidden vector based on the context of the shielded named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

FIG. 2 is a schematic flowchart of a model training method for identifying a named entity according to an embodiment;

FIG. 3 is an exemplary diagram of a text passage;

FIG. 4 is an exemplary diagram of a training implementation provided by an embodiment;

FIG. 5 is a flowchart illustrating a method for identifying a named entity using a model, according to an embodiment;

FIG. 6 is a schematic block diagram of a model training apparatus for identifying named entities, provided by an embodiment;

FIG. 7 is a schematic block diagram of an apparatus for named entity identification using a model, according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. Wherein the word segmentation sequence containing a plurality of word segmentations

Inputting into a first recurrent neural network, the first recurrent neural network outputting a hidden vector for each participle

Based on each hidden vector, the distribution probability of each participle in each classification can be determined, and based on the distribution probabilities, a classification result of each participle, namely a label of which classification each participle corresponds to, is obtained. The classification may be represented by a label. SOS is the beginning symbol of the word segmentation sequence, and EOS is the ending symbol of the word segmentation sequence.

Named entities (entities), which may also be referred to as Entity words, have noun properties in part of speech, including names of people, names of organizations, names of places, and all other Entity categories identified by names, and more generally, named entities also include categories such as numbers, dates, currencies, addresses, and the like.

For a word segmentation sequence containing a plurality of word segmentations, each word segmentation can be labeled in advance according to a defined label. Table 1 below is a meaning corresponding to the label defined in one example.

TABLE 1

Label (R)	Means of	Label (R)	Means of	Label (R)	Means of
						n	Common noun	f	Noun of orientation	s	Term of wording
nw	Name of work	PER	Name of a person	LOC	Place name
						ORG	Organization name	TIME	Time of day	O	Others

In table 1, "O" represents other meaning, referring to other words than the entity noun, such as verb, preposition, adjective, adverb, conjunctive, and the like. Multiple tags, from n to TIME, are a refined classification of entity nouns. The above classifications are merely examples provided for ease of understanding and are not intended to limit the present application.

The word segmentation sequence may be a sequence obtained after segmenting a text sequence. For example, when segmenting an english text sequence, each word or symbol is a segment; when the Chinese text sequence is segmented, the segmentation can be performed based on a preset segmentation dictionary. For example, for the English text sequence "List flights to Indianapolis with fast Monday mountain, please", each word and comma therein can be used as a participle. After a participle for the chinese text sequence "please list flights to indianapolis on monday morning and provide fares", a participle sequence "please-list-flights to monday-morning-to indianapolis-, -and-provide fares" can be obtained. The present specification does not limit the specific form of the segmentation sequence.

In order to more effectively and accurately determine the classification corresponding to each participle in the participle sequence, the first recurrent neural network can be trained by adopting a training set. The training set may include a large number of sample sequences and labels corresponding to each participle in the sample sequences. When the training of the first recurrent neural network is complete, it may be tested using a test set to determine model performance. The test set also contains a large number of word sequences and labels for each word. Due to the diversity of natural languages, the test set may contain rare entities, such as out-of-set words and low frequency words. The out-of-set word refers to a named entity that appears in the test set but does not appear in the training set. The low frequency words refer to named entities that appear in the test set, but appear less frequently in the training set. Models trained using the above training set do not classify these rare entities well and correctly and efficiently.

In order to train to obtain a model with better effectiveness and higher accuracy and enable the model to better classify and recognize rare entities when the model faces the rare entities, the embodiment of the specification provides a model training method. In the method, for each sample sequence in a training set, for example, referred to as a first sample sequence, a certain named entity in the first sample sequence, for example, the first named entity, is replaced with a certain preset character, for example, the first preset character, so as to obtain a second sample sequence. Determining a text segment containing a first preset character from a second sample sequence, determining a characterization vector of the text segment by adopting a first recurrent neural network, then constructing Gaussian distribution based on the characterization vector through a variational self-encoder, determining a global hidden vector of the text segment based on the Gaussian distribution, recursively determining a decoding hidden vector of participles in the text segment by adopting the global hidden vector as an initial hidden vector, determining a predicted value of the participles in the text segment based on the decoding hidden vector, determining a prediction loss value based on the difference between the participles in the text segment and the predicted value thereof and the distribution difference determined based on the Gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value. When the first recurrent neural network and the variational self-coder are trained well, even if the named entity is shielded by adopting the preset characters, the first recurrent neural network can well represent each participle by adopting the hidden vector based on the context of the shielded named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.

The following describes the examples of the present specification in detail.

Fig. 2 is a flowchart illustrating a model training method for recognizing a named entity according to an embodiment. The method may be performed by a computer. The computer may be implemented by any device, platform, or cluster of devices having computing, processing capabilities. The method includes the following steps S210 to S260.

Step S210, obtaining a first sample sequence containing a plurality of participles, wherein the participles comprise named entities and unnamed entitiesAn entity. The first sample sequence may be any one of sample sequences obtained from a training set, where the training set includes a label for each participle of the first sample sequence, and the label may indicate, for example, whether the participle belongs to a named entity, and to which kind of named entity the participle belongs. The sample sequence has the same structure as the above-mentioned word segmentation sequence, that is, the first sample sequence contains a plurality of words, and the first sample sequence is obtained by performing word segmentation on the text sequence. For example, the first sample sequence can be the English text sequence "List flights to Indianapolis with fares on Monday morning," wherein "List, flights, to, with fares, on, Please" can belong to a non-named entity and "Indianapolis, Monday, morning" belongs to a named entity. The first sample sequence may be

It is shown that,

is the Nth participle in the first sample sequence X, and N is an integer.

Step S220, replacing the first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text fragment containing the first preset character from the second sample sequence.

In this step, a first number of named entities may be randomly determined from at least one named entity in the first sample sequence, and the first named entities are replaced with a first preset character as first named entities. The first number may be a preset number smaller than the total number of the part words N of the first sample sequence, for example, a value of 1 or 2. The first predetermined character may be "[ UNK ]", for example.

In one embodiment, a corresponding random number may be generated for each participle in the first sequence of samples

，

Taking a value between 0 and 1 when the random number is

Greater than a predetermined threshold p, and the random number

When the corresponding word is named entity, the random number is added

The corresponding named entity serves as the first named entity. Expressed by a formula as

（1）

Wherein,

representing participles

Is marked with a label

Not otherwise classified, i.e.

Is a named entity class. After each participle in the first sample sequence is processed according to formula (1), a second sample sequence is obtained

。

In step S220, when the text segment containing the first preset character is determined from the second sample sequence, a sequence starting from the first preset character and ending with a first named entity after the first preset character in the second sample sequence may be determined as the text segment. Or determining a sequence which ends with a first preset character and begins with a first named entity before the first preset character in the second sample sequence as a text segment.

For example, FIG. 3 is an exemplary diagram of a text fragment. In the first sample sequence, "Indianapolis" is replaced with "[ UNK ]", the sequence from sequence number j to sequence number k may be determined as a text fragment, and k is the first named entity after j. The second row in fig. 3 shows the label of each participle in the first sample sequence, and the label meaning can be seen in table 1.

Step S230, using the first recurrent neural network, using the preset hidden vector as an initial hidden vector, recursively determining hidden vectors of a plurality of participles in the second sample sequence, and determining a characterization vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence. In this step, a preset implicit vector is input into the first recurrent neural network as an initial implicit vector, and the first recurrent neural network outputs a determined implicit vector of a participle in the second sample sequence.

The first Recurrent neural network may include a Recurrent Neural Network (RNN) or a Long Short-Term Memory (LSTM).

When determining the hidden vectors of a plurality of participles in the second sample sequence, determining the hidden vector of the first participle based on the initial hidden vector aiming at the first participle in the second sample sequence, and determining the hidden vector of the participle based on the hidden vector of the previous participle aiming at each participle after the first participle in the second sample sequence. This is to recursively determine the hidden vector of each participle, and the hidden vector of the next participle contains the information of each preceding participle. The hidden vector is a vector for representing the feature of the word segmentation. When determining the implicit vector of the participle based on the initial implicit vector or the implicit vector of the last participle, an f-function in the first recurrent neural network can be adopted, wherein the f-function contains parameters to be updated. For example, the hidden vector for each participle may be determined using the following equation (2):

（2）

wherein,

a hidden vector for the ith participle of the second sample sequence,

is a hidden vector of the i-1 th participle,

is an initial hidden vector. In this step, the initial hidden vector may be a preset hidden vector, and the preset hidden vector may be a randomly generated hidden vector or a certain hidden vector that is fixedly set.

The process of recursively determining the hidden vectors of the multiple participles in the second sample sequence by using the first recurrent neural network and using the preset hidden vector as the initial hidden vector can be understood as a process of encoding the participles by using the first recurrent neural network.

In step S230, when determining the token vector of the text segment based on the hidden vectors of the multiple participles in the second sample sequence, an initial hidden vector of a head participle and an initial hidden vector of a tail participle of the text segment may be determined from the hidden vectors of the multiple participles in the second sample sequence, and the token vector of the text segment may be determined based on a difference between the initial hidden vector of the tail participle and the initial hidden vector of the head participle.

The text segment is a segment intercepted from the second sample sequence, so that the implicit vector of the first word of the text segment can be determined from the second sample sequence and is used as the initial implicit vector of the first word; and determining the implicit vector of the tail word of the text segment from the second sample sequence as the initial implicit vector of the tail word. For example, referring to the example of fig. 3, the hidden vector of the jth participle and the hidden vector of the kth participle in the second sample sequence are respectively used as the initial hidden vector of the head participle and the initial hidden vector of the tail participle of the text segment.

When determining the representation vector of the text segment, the difference value between the initial hidden vector of the tail word and the initial hidden vector of the head word can be directly used as the representation vector, and the result of the difference value after predetermined transformation can also be used as the representation vector. For example, the predetermined transformation may include multiplying the difference by a certain coefficient.

In one example, the token vector of a text segment can be determined using the following equation (3):

（3）

wherein,

a token vector representing a segment of text from j to k,

is the initial hidden vector of the tail word,

is an initial hidden vector of the first-part word. j and k are both integers.

And step S240, constructing Gaussian distribution based on the characterization vectors through a variational self-encoder, and determining a global hidden vector aiming at the text segment based on the Gaussian distribution. A Variational Auto-Encoder (VAE) may be understood as a device in which a continuous gaussian distribution exists, and when a set of mean and variance is input into the Variational Auto-Encoder, the Variational Auto-Encoder may determine the gaussian distribution corresponding to the mean and variance, and obtain a corresponding vector based on sampling from the gaussian distribution.

In the embodiment, by the variational self-encoder, based on the characterization vector, the mean and variance of the gaussian distribution are determined, and based on the mean and variance of the gaussian distribution, the global latent vector for the text segment is determined. Specifically, the characterization vector is input into the VAE, the VAE may output a corresponding mean and variance according to the parameters thereof, when the mean and variance are determined, that is, a gaussian distribution corresponding to the mean and variance is determined, the VAE may sample from the gaussian distribution, and a vector obtained by sampling is used as an implicit vector of the text segment.

For example, the VAE may determine the mean and variance by the following equation (4):

（4）

wherein,

the token vectors for the text segments j through k,

is the mean value of the token vector and,

for the variance of the token vector to be,

、

、

and

for a parameter to be updated in the VAE, the parameter may take an initial value in an initial iteration and be continuously updated in subsequent iterations. VAE may be based on the mean of text segments

Sum variance

Determining global latent vectors for text segments

。

Step S250, a first recurrent neural network is adopted, the global hidden vector is used as an initial hidden vector, decoding hidden vectors of the participles in the text segment are determined recursively, and the predicted values of the participles in the text segment are determined based on the decoding hidden vectors. In this step, the global hidden vector is input to the first recurrent neural network as an initial hidden vector, and the first recurrent neural network outputs a decoded hidden vector of the participle in the determined text segment. In this step, determining the decoding hidden vector of the participle in the text segment based on the global hidden vector may be understood as reconstructing the participle in the text segment based on the global hidden vector, which corresponds to a process of decoding the participle, and specifically may decode and reconstruct the participle in the middle thereof.

In one embodiment, for each intermediate participle in the text segment except for the first participle and the last participle, a decoding hidden vector of the intermediate participle may be determined based on a decoding hidden vector of a previous participle through the first recurrent neural network, where the decoding hidden vector of the previous participle of the first intermediate participle is a global hidden vector. The number of intermediate participles may be one or more. This process can be expressed using the following equation (5):

（5）

wherein,

for the mth intermediate participle in the text segment

The decoded hidden vector of (1) is,

the decoded hidden vector for the m-1 th intermediate participle,

taking the value as a global latent vector

. m is in [ j +1, j +2, …, k-1]And taking values between the text segments, wherein the text segments are formed by word segments from j to k.

In determining the predicted value of the segmented word in the text segment based on the decoded hidden vector, it can be understood that the probability that the segmented word in the text segment is a word in a preset dictionary is determined, that is, what word the segmented word is predicted. Specifically, the prediction value of the word segmentation in the text segment can be calculated by adopting a softmax function, that is, by adopting formula (6):

（6）

wherein,

the predicted value of the mth intermediate participle in the text segment can be probability distribution;

a decoded hidden vector for the mth intermediate participle.

Is the parameter to be updated in the iterative process.

Step S260, determining a prediction loss value based on the difference between the word in the text segment and the predicted value thereof and the distribution difference determined based on the gaussian distribution, and updating the first recurrent neural network and the variational self-encoder in the direction of reducing the prediction loss value, that is, updating the parameters therein. In addition to updating the parameters of the first recurrent neural network and the variational self-encoder, the parameters to be updated mentioned in the above steps may also be updated.

The participles in the text segments are standard values, the standard values can be represented by probability distribution, the predicted values can also be represented by probability distribution, and the difference between the standard values and the predicted values can be calculated by adopting cross entropy. The participles in the text segment may be intermediate participles, and when the intermediate participles are plural, the cross entropy of each participle may be summed, and the prediction loss value may be calculated based on the sum value.

Since the parameters of the VAE may also be learned during the model training process, a corresponding KL (Kullback-Leibler) divergence may be determined based on the gaussian distribution, as a distribution difference, which represents a difference between the initial distribution of the VAE and its predicted gaussian distribution. In determining the prediction loss value, a sum value determination may be made based on a difference between a word in a text segment and its predicted value and a difference between distributions determined based on a gaussian distribution. For example, the predicted loss value can be expressed by the following formula (7):

（7）

wherein,

。

for cross entropy calculated based on the difference of a word in a text segment from its predicted value,

to determine the KL divergence based on the gaussian distribution,

is the negative value of the determined predicted loss value.

The steps S210 to S260 are an iterative process. This iterative process may be repeated multiple times until the iterative process converges. The convergence condition may include that the number of iterations is greater than a preset number threshold, or that the predicted loss value is less than a certain threshold, etc. In the iterative process shown in steps S210 to S260, the iterative process is described based on one first sample sequence, and in another embodiment, the plurality of first sample sequences may be processed according to the process included in steps S210 to S260, the total prediction loss value corresponding to the plurality of first sample sequences may be determined, and the first recurrent neural network and the variational self-encoder may be updated in a direction of reducing the total prediction loss value. Therefore, the times of updating the model parameters can be reduced, and the training efficiency is improved.

In another embodiment of the present specification, the first recurrent neural network may include a bidirectional recurrent neural network, such as a bidirectional RNN, or a bidirectional LSTM. In this embodiment, the execution processes represented by the above formulas (2) to (7) are executed once in the forward order of the sequence and once in the backward order of the sequence, the obtained prediction loss values are summed, and the first recurrent neural network and the variational self-encoder are updated in the direction of decreasing the summed prediction loss values. The sample characteristics extracted by the bidirectional training neural network are richer, so that the training process aiming at the sample sequence is more sufficient, and the model is more effectively and accurately trained. Specific embodiments are as follows.

In step S230, recursively determining hidden vectors of the plurality of participles in the second sample sequence, and determining a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, the steps include:

a first recursive neural network is adopted, a preset implicit vector is used as an initial implicit vector, first implicit vectors of a plurality of participles in a second sample sequence are determined recursively according to the forward sequence of the sequence, and second implicit vectors of a plurality of participles in the second sample sequence are determined recursively according to the backward sequence of the sequence; a first token vector of the text segment is determined based on the plurality of first hidden vectors, and a second token vector of the text segment is determined based on the plurality of second hidden vectors.

In step S240, a gaussian distribution is constructed based on the characterization vectors, and a global hidden vector for the text segment is determined based on the gaussian distribution, including:

the method comprises the steps of constructing a first Gaussian distribution based on a first characterization vector through a variation self-encoder, determining a first global hidden vector aiming at a text segment based on the first Gaussian distribution, constructing a second Gaussian distribution based on a second characterization vector, and determining a second global hidden vector aiming at the text segment based on the second Gaussian distribution.

In step S250, the step of recursively determining a decoding hidden vector of the participle in the text segment, and determining a predicted value of the participle in the text segment based on the decoding hidden vector comprises:

a first recursive neural network is adopted, a global hidden vector is used as an initial hidden vector, a first decoding hidden vector of each intermediate participle in a text segment is recursively determined according to the forward sequence of a sequence, and a second decoding hidden vector of each intermediate participle in the text segment is recursively determined according to the backward sequence of the sequence; and determining a first predicted value of each intermediate participle in the text segment based on the first decoding hidden vector, and determining a second predicted value of each intermediate participle in the text segment based on the second decoding hidden vector. The middle participles are participles except the first participle and the tail participle in the text segment.

In step S260, the step of determining the predicted loss value may include:

determining a first loss value based on the difference of each participle in the text segment and the first predicted value thereof and the first distribution difference determined based on the first Gaussian distribution; determining a second loss value based on the difference between each participle in the text segment and a second predicted value of the participle and a second distribution difference determined based on a second Gaussian distribution; a predicted loss value is determined based on a sum of the first loss value and the second loss value.

For the second sample sequence shown in fig. 3, the forward order of the sequence is the process from "List" to "please", and the backward order of the sequence is the process from "please" to "List". For the text fragment in FIG. 3, the forward order of the sequence is the process from "Indianapolis" to "Monday" and the backward order of the sequence is the process from "Monday" to "Indianapolis". The implementation of the forward training process and the backward training process can be performed according to the execution processes represented by the above formulas (2) - (7), and will not be described herein again.

For the first recurrent neural network, parameters in the forward process and the backward process are shared, and the forward process and the backward process are different in the sequence order.

The training targets for the two-way training process may be represented as

（8）

Wherein,

a first loss value determined for the forward training process,

a second loss value determined for the backward training process,

the value of the predicted loss is represented,

representing the parameters to be updated of the first recurrent neural network,

showing other parameters to be updated, and X is the first sample sequence.

Fig. 4 is an exemplary diagram of a training execution process according to an embodiment. In this embodiment, the text segment is "Indianapolis with fares on Monday" where "Indianapolis" is masked as the first preset symbol "[ UNK]". Performing local context reconstruction on the text segment, determining a mean value mu and a logarithmic variance log sigma based on the characterization vectors of the text segment, determining a global hidden vector of the text segment based on Gaussian distribution corresponding to the mean value mu and the logarithmic variance log sigma, and taking the global hidden vector as an initial hidden vector of a middle participle of' with fares on

Separately determining the decoding hidden vector of each intermediate participle

~

. Based on the decoded implicit vector, the processes of word segmentation prediction, loss value determination and the like can be performed.

When the training of the first recurrent neural network is completed based on the above embodiment, named entities in the segmentation sequence are identified based on the first recurrent neural network, and even if there are extra-cluster words that do not appear in the training set or low-frequency words in the segmentation sequence, the first recurrent neural network can determine a hidden vector that is superstrong in representation for each segmentation based on the segmentation sequence, and when the hidden vector can well represent sequence features, the label of the segmentation can be more accurately determined.

Fig. 5 is a flowchart illustrating a method for identifying a named entity using a model according to an embodiment. The method is performed by a computer, which may be implemented by any device, platform, or cluster of devices having computing, processing, capabilities. The method includes the following steps S510 to S540.

Step S510, a first segmentation sequence to be identified, which includes a plurality of segmentations, which includes a named entity and a non-named entity, is obtained. The first segmentation sequence may be any one of the segmentation sequences in the test set, or may be a segmentation sequence obtained in another manner.

Step S520, inputting the first word segmentation sequence into the trained first recurrent neural network, and obtaining hidden vectors of a plurality of words in the first word segmentation sequence. The first recurrent neural network is trained using the method shown in fig. 2.

The first recurrent neural network may also employ a bidirectional recurrent neural network, such as a bidirectional RNN, or a bidirectional LSTM. In this embodiment, the first word segmentation sequence may be input into a trained first recurrent neural network, so as to obtain forward hidden vectors of a plurality of words of the first word segmentation sequence determined according to a forward order of the sequence, and backward hidden vectors of a plurality of words of the first word segmentation sequence determined according to a backward order of the sequence, and for each word segmentation, the forward hidden vectors and the backward hidden vectors are vector-spliced, so as to obtain a hidden vector of the word segmentation.

Step S530, determining a distribution probability of each participle of the first participle sequence on a plurality of preset labels based on the hidden vector of each participle of the first participle sequence. In this step, a Conditional Random Field (CRF) may be used to determine the distribution probability of each participle of the first participle sequence on a plurality of preset tags. The parameters in the CRF may be trained in advance according to a training set. Specifically, the hidden vector of each participle of the first participle sequence may be input into the CRF to obtain the distribution probability of each participle of the first participle sequence on the plurality of preset tags. The plurality of preset tags may be, for example, a plurality of tags as shown in table 1.

The parameters in the CRF may be trained in advance according to a training set. During training, parameters in the first recurrent neural network may be kept unchanged, while loss values are obtained from the labels in the training set, and the parameters of the CRF are adjusted in a direction to reduce the loss values, for example.

Step S540, determining a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence. Specifically, the preset label corresponding to the maximum probability value in the distribution probabilities may be determined as the preset label corresponding to the word segmentation, that is, the classification result.

When the embodiment shown in fig. 2 is used to train the first recurrent neural network, even if an untrained named entity is encountered in the segmentation sequence, the first recurrent neural network can well represent each segmentation by using a hidden vector based on the context of the named entity. When the hidden vector of each participle determined by the model has higher representation capability, the named entity identification based on the hidden vector of the participle can be more effective and more accurate.

The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

FIG. 6 is a schematic block diagram of a model training apparatus for identifying a named entity according to an embodiment. The apparatus 600 is deployed in a computer. This embodiment corresponds to the embodiment of the method shown in fig. 2. The apparatus 600 comprises:

a first obtaining module 610 configured to obtain a first sample sequence including a plurality of participles, the plurality of participles including a named entity and a non-named entity;

a first replacing module 620, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence;

a first determining module 630, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;

a first constructing module 640 configured to construct, by a variational auto-encoder, a gaussian distribution based on the characterization vector, determine a global latent vector for the text segment based on the gaussian distribution;

a second determining module 650, configured to recursively determine, by using the first recurrent neural network, a decoding hidden vector of a participle in the text segment with the global hidden vector as an initial hidden vector, and determine a predicted value of the participle in the text segment based on the decoding hidden vector;

a first updating module 660 configured to determine a prediction loss value based on a difference between a word in the text segment and a predicted value thereof and a distribution difference determined based on the gaussian distribution, and update the first recurrent neural network and the variational self-encoder in a direction of reducing the prediction loss value.

In one embodiment, the first replacing module 620, when replacing the first named entity in the first sample sequence with the first preset character, includes:

In one embodiment, the first replacing module 620, when determining the text segment containing the first preset character from the second sample sequence, includes:

In one embodiment, the determining, by the first determining module 630, a token vector of the text segment based on the hidden vectors of the plurality of participles in the second sample sequence includes:

In an embodiment, the first building module 640 is specifically configured to:

In one embodiment, the second determining module 650, when recursively determining the decoding hidden vectors of the participles in the text segment, includes:

In one embodiment, the first recurrent neural network comprises a bidirectional recurrent neural network; the first determining module 630 is specifically configured to:

the first building block 640 is specifically configured to:

the second determining module 650 is specifically configured to:

the first updating module 660, when determining the predicted loss value, includes:

FIG. 7 is a schematic block diagram of an apparatus for named entity identification using a model, according to an embodiment. The apparatus 700 is deployed in a computer. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 5. The apparatus 700 comprises:

a second obtaining module 710 configured to obtain a first segmentation sequence to be identified, where the first segmentation sequence includes a plurality of segmentations, and the plurality of segmentations include a named entity and a non-named entity;

a first input module 720, configured to input the first word segmentation sequence into a trained first recurrent neural network, so as to obtain hidden vectors of a plurality of word segmentations in the first word segmentation sequence; the first recurrent neural network is obtained by training by adopting the method shown in FIG. 2;

a third determining module 730, configured to determine, based on the hidden vector of each participle of the first participle sequence, a distribution probability of each participle of the first participle sequence on a plurality of preset labels;

the fourth determining module 740 is configured to determine a preset label corresponding to each participle based on the distribution probability of each participle of the first participle sequence.

The above device embodiments correspond to the method embodiments, and specific descriptions may refer to descriptions of the method embodiments, which are not repeated herein. The device embodiment is obtained based on the corresponding method embodiment, has the same technical effect as the corresponding method embodiment, and for the specific description, reference may be made to the corresponding method embodiment.

Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 5.

The present specification also provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 5.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and is not intended to limit the scope of the present invention, and any modification, equivalent replacement, or improvement made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A model training method for identifying named entities, performed by a computer, comprising:

replacing a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determining a text segment containing the first preset character from the second sample sequence; the first named entity is determined from the named entities of the first sample sequence;

a first recurrent neural network is adopted, a preset implicit vector is used as an initial implicit vector, implicit vectors of a plurality of participles in the second sample sequence are determined recursively, and the implicit vector of a next participle in the second sample sequence contains information of each participle in the front; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;

recursively determining hidden vectors of participles in the text segment as decoding hidden vectors by adopting the first recurrent neural network and taking the global hidden vector as an initial hidden vector, and determining a predicted value of the participles in the text segment based on the decoding hidden vectors;

2. The method of claim 1, the step of replacing the first named entity in the first sequence of samples with a first preset character, comprising:

3. The method of claim 1, the step of determining a text segment containing the first preset character from the second sample sequence comprising:

4. The method of claim 3, the step of determining a characterization vector for the text segment based on hidden vectors for a plurality of participles in the second sample sequence, comprising:

determining a hidden vector of a head participle and a hidden vector of a tail participle of the text segment from the hidden vectors of the plurality of participles of the second sample sequence, respectively serving as an initial hidden vector of the head participle and an initial hidden vector of the tail participle, and determining a characterization vector of the text segment based on a difference value of the initial hidden vector of the tail participle and the initial hidden vector of the head participle.

5. The method of claim 1, the step of constructing a gaussian distribution based on the characterization vectors, determining a global hidden vector for the text segment based on the gaussian distribution, comprising:

6. The method of claim 1, the step of recursively determining decoded latent vectors for the participles in the text segment, comprising:

7. The method of claim 1, the first recurrent neural network comprising a bidirectional recurrent neural network; said recursively determining hidden vectors for a plurality of participles in said second sequence of samples; determining a token vector for the text segment based on the hidden vectors of the plurality of participles in the second sample sequence, including:

the step of determining a predicted loss value comprises:

8. The method of claim 1, the first recurrent neural network comprising a Recurrent Neural Network (RNN) or a long-short term memory (LSTM).

9. A method for named entity recognition using a model, performed by a computer, comprising:

10. A model training apparatus for identifying named entities, deployed in a computer, comprising:

a first replacing module, configured to replace a first named entity in the first sample sequence with a first preset character to obtain a second sample sequence, and determine a text segment containing the first preset character from the second sample sequence; the first named entity is determined from the named entities of the first sample sequence;

a first determining module, configured to recursively determine hidden vectors of a plurality of participles in the second sample sequence by using a first recurrent neural network and taking a preset hidden vector as an initial hidden vector, so that a hidden vector of a next participle in the second sample sequence contains information of each preceding participle; determining a token vector of the text segment based on hidden vectors of a plurality of participles in the second sample sequence;

a second determining module, configured to recursively determine hidden vectors of the participles in the text segment as decoding hidden vectors by using the global hidden vector as an initial hidden vector by using the first recurrent neural network, and determine predicted values of the participles in the text segment based on the decoding hidden vectors;

11. The apparatus of claim 10, the first replacement module, when replacing the first named entity in the first sequence of samples with a first predetermined character, comprises:

12. The apparatus of claim 10, the first replacement module, when determining the text segment containing the first preset character from the second sample sequence, comprises:

13. The apparatus of claim 12, the first determining module, when determining the characterization vector for the text segment based on the hidden vectors for the plurality of participles in the second sample sequence, comprises:

14. The apparatus of claim 10, the first building block being specifically configured to:

15. The apparatus of claim 10, the second determining module, when recursively determining the decoding hidden vectors for the participles in the text segment, comprises:

16. The apparatus of claim 10, the first recurrent neural network comprising a bidirectional recurrent neural network; the first determining module is specifically configured to:

the first building module is specifically configured to:

the second determining module is specifically configured to:

the first updating module, when determining the predicted loss value, includes:

17. The apparatus of claim 10, the first recurrent neural network comprising a Recurrent Neural Network (RNN) or a long-short term memory (LSTM).

18. An apparatus for named entity recognition using a model, deployed in a computer, comprising:

19. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-9.

20. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-9.