Nothing Special   »   [go: up one dir, main page]

CN111680529A - Machine translation algorithm and device based on layer aggregation - Google Patents

Machine translation algorithm and device based on layer aggregation Download PDF

Info

Publication number
CN111680529A
CN111680529A CN202010527099.6A CN202010527099A CN111680529A CN 111680529 A CN111680529 A CN 111680529A CN 202010527099 A CN202010527099 A CN 202010527099A CN 111680529 A CN111680529 A CN 111680529A
Authority
CN
China
Prior art keywords
module
atransformer
layer
translation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010527099.6A
Other languages
Chinese (zh)
Inventor
汪金玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010527099.6A priority Critical patent/CN111680529A/en
Publication of CN111680529A publication Critical patent/CN111680529A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of text translation, and discloses a machine translation algorithm and a device based on layer aggregation, wherein the algorithm comprises the following steps: acquiring a Chinese sentence to be translated, and preprocessing the Chinese sentence; the ATransformer coder extracts multilayer semantic feature information of the preprocessed sentences based on a multilayer information extraction algorithm; decoding the multilayer semantic feature information by an ATransformer decoder, and outputting a translation target language sequence; and judging the translation target language sequence by the judging model D, if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the final machine translation result, otherwise, updating parameters of the ATransformer model based on a strategy gradient algorithm, inputting the preprocessed sentence to be translated into an updated ATransformer encoder, and realizing the machine translation algorithm again. The invention also provides a device of the machine translation algorithm based on layer aggregation. The invention realizes intelligent translation of the text.

Description

Machine translation algorithm and device based on layer aggregation
Technical Field
The invention relates to the technical field of text translation, in particular to a machine translation algorithm and a machine translation device based on layer aggregation.
Background
With the development of deep learning in the field of natural language processing, machine translation has transitioned from early studies of statistical machine translation, which is mainly centered on shallow machine learning, to neural machine translation, which is centered on deep learning techniques.
The traditional statistical machine translation has the disadvantages that human experts are needed to design features and corresponding translation processes, long-distance dependence is difficult to process, and serious data sparsity problem is caused by data dispersion; and the neural machine translation model effectively relieves long-distance dependence by combining an attention mechanism, and the effect is far better than that of a statistical machine translation model on a large-scale parallel corpus. However, through research, different layers in the neural machine translation model can capture different types of grammatical and semantic information, the existing neural machine translation model only utilizes the last layer information of the model, the last layer information is used as the summary of the whole network to the input, and the utilization of information propagated by the middle layer is lacked, meanwhile, the existing neural machine translation model usually adopts a single model training method based on the maximum likelihood principle, namely, the current translation model is used as a training target, and training is performed by maximizing the conditional probability of generating target language translation by taking the source language as a condition, so that the naturalness and the accuracy of the translation result are difficult to guarantee.
In view of this, those skilled in the art need to solve the problem of how to effectively improve the quality of machine translation while deeply capturing feature information between model layers and considering the relationship between layers.
Disclosure of Invention
The invention provides a machine translation algorithm and device based on layer aggregation, which can deeply capture characteristic information between model layers and consider the relation between the layers, and can effectively improve the quality of machine translation.
In order to achieve the above object, the present invention provides a layer aggregation based machine translation algorithm, including:
acquiring a Chinese sentence to be translated, and performing text preprocessing operation on the Chinese sentence;
inputting the preprocessed statement into a preset ATransformer encoder, wherein the ATransformer encoder performs multilayer semantic feature information extraction on the statement based on a multilayer information extraction algorithm;
inputting the multilayer semantic feature information into a preset ATransformer decoder, wherein the preset ATransformer decoder decodes the multilayer semantic feature information and outputs a translation target language sequence;
inputting a translation target language sequence into a pre-trained discrimination model D, and judging the translation target language sequence by the discrimination model D;
and if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the final machine translation result, otherwise, updating parameters of the ATransformer model based on a strategy gradient algorithm, inputting the preprocessed sentence to be translated into the updated ATransformer coder, and realizing the machine translation algorithm again.
Optionally, the text preprocessing operation includes:
matching the constructed stop word list with words in the text data one by one, and deleting the words if matching succeeds;
finding out all possible words in the word string by constructing a prefix dictionary and a self-defined dictionary; and
according to all the possible words found, each word corresponds to one directed edge in the graph and is assigned with a corresponding side length weight, then aiming at the segmentation graph, in all paths from the starting point to the end point, the path sets with the length values of 1 st, 2 nd, … th, i th, … th and N th in a strict ascending order are solved and are taken as corresponding rough segmentation result sets, and the rough segmentation result sets are segmentation result sets of the Chinese sentences to be translated.
Optionally, the ATransformer encoder performs multi-layer semantic feature information extraction on the sentence based on a multi-layer information extraction algorithm, where the method includes:
the main layer of the first module in the ATransformer coder receives the preprocessed sentence to be translated, and the first sub-layer in the main layer
Figure BDA0002533892620000021
Calculating the statement to be translated based on a self-attention mechanism, and inputting the calculation result into a second sublayer in the main layer
Figure BDA0002533892620000022
Second sub-layer
Figure BDA0002533892620000023
Residual error connection is carried out on the output result based on a feedforward full-connection neural network; a merging layer in the module merges the output results of the two sub-layers by using a Joint function, and takes the merged result as an input value of the next module;
other modules in the ATransformer coder sequentially receive the output value of the previous module, the output value of the previous module is used as the input value of the current module for calculation and output, the output value of the last module is the extracted multilayer semantic feature information, the ATransformer coder extracts 12 layers of feature information and performs feature fusion, and the whole network structure is combined into a deeper hierarchical structure and a larger hierarchical structure from the shallowest structure and the smallest structure in an iterative manner;
the calculation formula based on the self-attention mechanism is as follows:
Figure BDA0002533892620000031
wherein:
LayerNorm (·) is a normalization function;
i is the ith module in the encoder;
attention (. cndot.) is the self-Attention mechanism,
Figure BDA0002533892620000032
dkdimension of the sentence to be translated;
Figure BDA0002533892620000033
the three vector parameters obtained by the i-1 th module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are preset in the invention;
Figure BDA0002533892620000034
the output value of the second sub-layer in the i-1 th module main layer is the preprocessed value when i is 1A sentence to be translated;
second sub-layer
Figure BDA0002533892620000035
The calculation formula for performing residual error connection on the output result based on the feedforward fully-connected neural network is as follows:
Figure BDA0002533892620000036
Figure BDA0002533892620000037
wherein:
FC (-) is a feed-forward fully-connected network;
WEtraining weights for a preset encoder;
bEsetting a preset encoder bias parameter;
the merging layer of the module merges the output results of the sub-layers in the main layer and outputs a result LiWherein i represents the ith module, and the merged fusion information is used as the input of the next module, so that the fusion of 12 layers of information is realized through 6 modules, and the final information fusion result is used as the multi-layer semantic feature information; and
when i is 1, the formula for merging the sub-layer output results is as follows:
Figure BDA0002533892620000041
wherein:
Figure BDA0002533892620000042
the output result of the first sub-layer of the first module;
Figure BDA0002533892620000043
the output result of the second sub-layer of the first module is;
joint (·) is a Joint function, Joint (a, b) ═ LayerNorm (FC ([ a; b ]) + a + b);
when i >1, the formula for merging the sub-layer output results is as follows:
Figure BDA0002533892620000044
wherein:
Figure BDA0002533892620000045
the first sub-layer is the ith module main layer;
Figure BDA0002533892620000046
a second sublayer that is the ith module main layer;
Li-1the output result of the merging layer of the i-1 th module is obtained;
joint (·) is a Joint function, Joint (a, b, c) ═ LayerNorm (FC ([ a; b; c ]) + a + b + c).
Optionally, the decoding, by the preset ATransformer decoder, the multi-layer semantic feature information, and outputting a translation target language sequence, where the decoding includes:
the ATransformer decoder is formed by stacking 3 same modules, wherein each module is divided into a multi-head attention mechanism layer, a DSL1 sublayer and a DSL2 sublayer;
a first module of the ATransformer decoder receives multilayer semantic feature information, a multi-head attention mechanism layer, a DSL1 sublayer and a DSL2 sublayer in the module sequentially carry out output calculation on the multilayer semantic feature information, and an output result of the DSL2 sublayer is used as the input of the next module;
other modules in the ATransformer decoder receive the output value of the previous module, the output value of the previous module is used as the input value of the current module for calculation and output, and the output value of the DSL2 sublayer in the last module is the translation target language sequence obtained through final translation;
multiple attention mechanism layers for the decoder module
Figure BDA0002533892620000047
The calculation formula of (2) is as follows:
Figure BDA0002533892620000051
wherein:
LayerNorm (·) is a normalization function;
attention (. cndot.) is the self-Attention mechanism,
Figure BDA0002533892620000052
dkdimension of the sentence to be translated;
Figure BDA0002533892620000053
the three vector parameters obtained by the i-1 th decoder module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are obtained by training the last module in the encoder;
Figure BDA0002533892620000054
the output value of the i-1 th decoder module DSL2 sub-layer is the multilayer semantic feature information when i is equal to 1;
the calculation formula of the DSL1 sublayer of the decoder module is:
Figure BDA0002533892620000055
wherein:
KE,VEparameters obtained by the last module of the encoder;
the calculation formula of the DSL2 sublayer of the decoder module is:
Figure BDA0002533892620000056
Figure BDA0002533892620000057
wherein:
FC (-) is a feed-forward fully-connected network;
WDtraining weights for a preset decoder;
bDbiasing parameters for a preset decoder;
the output result of the DSL2 sublayer in the last module is taken as the translation target language sequence T _ Pred.
Optionally, the training process of the discriminant model D is:
(1) splicing a source language sentence S and a reference translation T in a training set into a two-dimensional matrix representation in a splicing mode, and inputting a splicing result as a positive sample;
(2) inputting a source language sentence S into an ATransformer model to obtain a translation target language sequence T _ p, splicing the source language sentence S and the T _ p, and inputting a splicing result as a negative sample so as to set an optimization target function of a discrimination model D:
maxV(D)=logD((S,T))+log(1-D(ATransformer(S,T_p)))
(3) performing convolution operation on currently obtained positive and negative sample input to obtain positive and negative sample characteristics, wherein an activation function sigma of the convolution operation is a softmax function, and a formula of the convolution operation is as follows:
Fi,j=σ(WF*fi,j+bF)
wherein:
WFtraining weights for preset convolutional layers;
bFis a preset convolution layer bias parameter;
fi,jinputting positive and negative samples;
(4) and obtaining probability distribution of positive and negative sample categories by using a sigmoid function at the full connection layer, if the probability difference of the positive and negative sample categories is less than the error rate set by the method and the optimization target function reaches the maximum, considering that the discriminant model is successfully trained, and otherwise, performing convolution operation again.
Optionally, the updating parameters of the ATransformer model based on the policy gradient algorithm includes:
1) according to the source language sentence S and the model D, the invention provides the following loss function to train the ATransformer model:
Loss=log(1-D(S,T_Pred))
wherein:
d is a discrimination model;
t _ Pred is a translation target language sequence output by the ATransformer model according to a source language sentence S;
2) performing gradient calculation on the parameters of the ATransformer model needing to be updated, and updating the parameters of the ATransformer model by using a gradient descending mode by using the gradient obtained after calculation, so as to finish training and optimization of the ATransformer model, wherein the gradient is calculated as follows:
Figure BDA0002533892620000061
wherein:
ATransformer (. DELTA.S) is the distribution of conditions generated by the ATransformer model;
theta is a parameter of the ATransformer model.
In addition, the present invention also provides an apparatus for a layer aggregation based machine translation algorithm, the apparatus comprising:
the text acquisition module is used for acquiring the Chinese sentences to be translated and preprocessing the Chinese sentences;
the encoding module is used for encoding the preprocessed Chinese sentences by using an ATransformer encoder and extracting multilayer semantic feature information of the Chinese sentences;
the decoding module is used for decoding the multilayer semantic feature information by using an ATransformer encoder so as to output a translation target language sequence;
and the translation result judging module is used for judging the translation target language sequence.
Optionally, the translation result distinguishing module includes:
judging the translation target language sequence by the discrimination model;
if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the final machine translation result;
otherwise, updating parameters of the coding module and the decoding module based on the strategy gradient algorithm, inputting the preprocessed sentence to be translated into the updated coding module, and realizing the machine translation algorithm again.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon the model training program instructions of the ATransformer model, which are executable by one or more processors to implement the steps of a layer aggregation based machine translation algorithm as described above.
Compared with the prior art, the invention provides a machine translation algorithm and a device based on layer aggregation, and the technology has the following advantages:
firstly, the encoder of the ATransformer model provided by the invention is provided with 6 completely same modules, wherein the first module of the encoder is used for receiving a sentence to be translated and carrying out feature extraction processing on the sentence to be translated, so that a feature extraction result is input into the next module, and two modules exist in each module at the same time
Figure BDA0002533892620000081
A sub-layer,
Figure BDA0002533892620000082
Sub-layers, and a merged layer, where i denotes the ith module,
Figure BDA0002533892620000083
the sub-layer receives the output value of the previous module, similarity calculation is carried out on each word in the output value of the previous module and all words in the sentence based on self-attention mechanism to obtain respective weight, and softmax function is usedThe numbers are normalized by respective weights while using
Figure BDA0002533892620000084
Feed-forward fully-connected neural network pair in sub-layers
Figure BDA0002533892620000085
Carrying out nonlinear mapping on the output values of the sub-layers so as to extract characteristic information, finally merging the output values of the two sub-layers by using a Joint function Joint (-) based on a feedforward neural network in a merging layer, taking a merging result as an output value of a next module, wherein the output value of the last module is multilayer semantic characteristic information; in the prior art, only 12 sub-layers are directly stacked, wherein 6 attention layers and 6 feature extraction layers exist, and feature extraction algorithms are used for sequentially extracting the feature information of the sentences to be translated existing in each sub-layer, so that the feature information of the 6 sub-layers is equivalently extracted and linearly stacked, and the algorithm of the invention sequentially stacks the feature information based on the attention mechanism
Figure BDA0002533892620000086
Sublayer and based on feedforward fully-connected neural network
Figure BDA0002533892620000087
The sub-layer is arranged in one module, the calculation results of the two layers for the statement to be translated are merged by using a Joint function Joint (-), and further extracting characteristic information by using a feedforward neural network, and simultaneously taking a final result as an input value of a next module, therefore, the characteristic information of 12 layers is extracted and the characteristic fusion is carried out, the whole network structure is combined into a deeper and larger hierarchical structure from the shallowest and smallest structure in an iterative way, more and more detailed language structure characteristics and network available information between layers can be obtained, compared with the 6 layers of language characteristic information obtained by the prior art in a linear superposition way, the language characteristic information obtained by the algorithm of the invention comprises the characteristic information of 12 layers, meanwhile, the invention adopts a characteristic fusion mode to fuse the obtained characteristic information, and the fused characteristic information is obtained.The feature information can more accurately reflect the language features in the sentence to be translated, so that the machine translation algorithm has higher translation accuracy compared with the prior art.
Secondly, compared with the prior art, the invention provides a strategy for updating the machine translation model according to the judgment result by judging the machine translation result, in detail, the judgment model D judges the translation target language sequence, if the judgment result is the translation result, the translation target language sequence is directly output, otherwise, the ATransformer model is subjected to parameter updating, so the invention provides the following loss function to train the ATransformer model: loss log (1-D (S, T _ Pred)), where D is the discrimination model, T _ Pred is the translation target language sequence output by the ATransformer model according to the source language sentence S, (S, T _ Pred) represents the translation pair formed by the ATransformer model, and the discrimination model D receives the translation pair formed by the ATransformer model (S, T _ Pred) and outputs the similarity probability between T _ Pred and the reference translation T, and if the similarity probability between T _ Pred and the reference translation T is the maximum, i.e. the difference between T _ Pred and the reference translation T is the minimum, then the Loss function proposed by the present invention is the minimum, therefore, the ATransformer model is trained by using the Loss function proposed by the present invention, and if the Loss function is the minimum, the ATransformer model obtained by training is the optimal model, compared with the prior art, the algorithm of the present invention can determine the discrimination result of the model in the machine translation process, and the machine translation model is updated in real time, so that higher algorithm translation accuracy is achieved.
Finally, the invention analyzes the algorithm time complexity of the proposed algorithm. In each layer of the algorithm, for the main layer, each word in the sentence to be translated needs to encode a d-dimensional vector with a fixed length, so that the input calculation of the whole sentence to be translated depends on the sentence length n of the sentence to be translated, and the ATransformer model provided by the invention has the same structure
Figure BDA0002533892620000091
The sub-layer needs to calculate the weight information of each word in the sentence to be translated, then
Figure BDA0002533892620000092
The time complexity of the sub-layer is also determined by the sentence length n, and the time complexity of the sub-layer is O (n), so the time complexity of the main layer in the model of the invention is O (d.n)2) (ii) a For the sequence calculation operation in the model, the prior art generally adopts a multi-attention layer series connection mode to sequentially perform attention calculation in the sentence to be translated, so the time complexity of the model time sequence operation in the prior art is O (n), compared with the prior art, the method is based on a self-attention parallel processing mechanism, each main layer is modularized, the modules of each main layer are independent from each other, and the weight calculation based on the attention mechanism can be performed independently and simultaneously, so that the time sequence operation in the prior art is reduced from O (n) to O (1), for the merging layer provided by the invention, the calculation result of the main layer is mainly subjected to aggregation calculation, and the time complexity is O (1). Thus, the algorithm of the present invention has a time complexity of O (d.n) as a whole2) Compared with the prior art O (d.n)3) The algorithm reduces certain time overhead and can obtain the result of machine translation more quickly.
Drawings
Fig. 1 is a flow chart of a layer aggregation-based machine translation algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of a layer aggregation-based machine translation algorithm apparatus according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a machine translation algorithm and device based on layer aggregation, which can deeply capture characteristic information between model layers and consider the relationship between the layers and effectively improve the quality of machine translation. Referring to fig. 1, a flowchart of a layer aggregation-based machine translation algorithm according to an embodiment of the present invention is shown.
In this embodiment, the layer aggregation based machine translation algorithm includes:
and S1, acquiring the Chinese sentence to be translated and preprocessing the Chinese sentence.
Firstly, the invention obtains the Chinese sentence to be translated and carries out preprocessing operation on the Chinese sentence, wherein in one embodiment of the invention, the preprocessing operation comprises word stop and word segmentation processing;
the method for removing stop words selected by the invention is to utilize a stop word list to carry out filtering, carry out one-to-one matching through the constructed stop word list and words in the text data, if the matching is successful, the word is the stop word, and the word needs to be deleted; the stop words are words with little practical meaning in the text data function words, and in the text, the classification of the text is not affected, but the occurrence frequency is high, and the stop words comprise common pronouns, prepositions and the like.
Further, because words have the capability of truly reflecting text content, but a chinese text is not separated from words by spaces as in an english text, a word segmentation operation is required for the chinese text. In the embodiment of the invention, the word segmentation operation is carried out on the text by using a word segmentation algorithm based on the N-shortest path;
the basic idea of the N-shortest path word segmentation algorithm is to find out all possible words in a character string according to a word segmentation dictionary and construct a word segmentation directed acyclic graph. In the embodiment of the invention, all possible words in the word string are found by constructing a prefix dictionary and a self-defined dictionary. The prefix dictionary includes prefixes of each participle in the statistical dictionary, for example, prefixes of a word "Beijing university" in the statistical dictionary are "Beijing", "Beijing Dada", respectively; the word "university" is prefixed by "big", etc.; the self-defined dictionary, which may also be called a proper noun dictionary, is a word that is not present in the statistical dictionary but is specific and exclusive in a certain field, such as resume, work experience, etc.
According to all the possible words found out above, each word corresponds to one directed edge in the graph, and is assigned with a corresponding side length (weight), then for the segmentation graph, in all the paths from the starting point to the end point, the path sets with the length values of 1 st, 2 nd, … th, i th, … th and N th in a strict ascending order (the values at any two different positions are not equal) are solved as corresponding rough segmentation result sets. If the lengths of two or more paths are equal, the length of the paths is parallel to the ith, the paths are listed into a rough-scoring result set, the sequence numbers of other paths are not influenced, the size of the final rough-scoring result set is larger than or equal to N, and the rough-scoring result set is the word segmentation result set of the Chinese sentence to be translated.
And S2, inputting the preprocessed sentence into a preset ATransformer encoder, wherein the ATransformer encoder performs multilayer semantic feature information extraction on the sentence based on a multilayer information extraction algorithm.
Further, the invention inputs the preprocessed statements into a preset ATransformer encoder, wherein the ATransformer encoder is formed by stacking 6 same modules, each module is divided into a main layer and a merging layer, each main layer is provided with two sub-layers, the first sub-layer contains a self-attention mechanism, and the second sub-layer is a fully-connected feedforward network layer;
in an embodiment of the present invention, a main layer of a first module of the ATransformer encoder receives a preprocessed sentence to be translated, two sublayers in the main layer sequentially perform calculation and output on the preprocessed sentence to be translated, and input an output result to a merging layer for merging calculation, where the merging result is simultaneously input to a next module for calculation and output, and a result of final information fusion is used as multilayer semantic feature information;
wherein the first sub-layer of the ith module main layer
Figure BDA0002533892620000111
The statement to be translated is calculated based on a self-attention mechanism, and the calculation formula based on the self-attention mechanism is as follows: :
Figure BDA0002533892620000121
wherein:
LayerNorm (·) is a normalization function;
i is the ith module in the encoder;
attention (. cndot.) is the self-Attention mechanism,
Figure BDA0002533892620000122
dkdimension of the sentence to be translated;
Figure BDA0002533892620000123
the three vector parameters obtained by the i-1 th module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are preset in the invention;
Figure BDA0002533892620000124
and when i is equal to 1, the output value of the second sub-layer in the i-1 th module main layer is the preprocessed sentence to be translated.
Second sub-layer
Figure BDA0002533892620000125
The calculation formula for performing residual error connection on the output result based on the feedforward fully-connected neural network is as follows:
Figure BDA0002533892620000126
Figure BDA0002533892620000127
wherein:
FC (-) is a feed-forward fully-connected network;
WEtraining weights for a preset encoder;
bEis a preset encoder bias parameter.
Further, the merging layer of the module merges the output results of the sub-layers in the main layer, and outputs a result LiWherein i represents the ith module, and the merged fusion information is used as the input of the next module, so that through 6 modules, the invention realizes the fusion of 12 layers of information and takes the final information fusion result as the multilayer semantic feature information; by extracting the feature information once every two layers, more and more detailed language structure features and network available information can be obtained compared with the prior art;
when i is 1, the formula for merging the sub-layer output results is as follows:
Figure BDA0002533892620000128
wherein:
Figure BDA0002533892620000129
the output result of the first sub-layer of the first module;
Figure BDA00025338926200001210
is the output result of the second sub-layer of the first module.
Joint (·) is a Joint function, Joint (a, b) ═ LayerNorm (FC ([ a; b ]) + a + b).
When i >1, the formula for merging the results output by the sublayer is as follows:
Figure BDA0002533892620000131
wherein:
Figure BDA0002533892620000132
the first sub-layer is the ith module main layer;
Figure BDA0002533892620000133
a second sublayer that is the ith module main layer;
Li-1the output result of the merging layer of the i-1 th module is obtained;
joint (·) is a Joint function, Joint (a, b, c) ═ LayerNorm (FC ([ a; b; c ]) + a + b + c).
And S3, inputting the multilayer semantic feature information into a preset ATransformer decoder, and decoding the multilayer semantic feature information by the preset ATransformer decoder to output a translation target language sequence.
Furthermore, the present invention inputs the multi-layer semantic feature information into a preset ATransformer decoder, wherein the ATransformer decoder is composed of three identical modules, and each module is composed of two sub-layers and a multi-head attention mechanism layer
Figure BDA0002533892620000134
Composition is carried out;
in an embodiment of the present invention, a first module of the ATransformer decoder receives multiple layers of semantic feature information, and a multi-head attention mechanism layer, a DSL1 sublayer and a DSL2 sublayer in the first module sequentially perform output calculation on the received information, and an output result is used as an input of a next module;
a multi-headed attention mechanism layer of the ith decoder module
Figure BDA0002533892620000135
The calculation formula of (2) is as follows:
Figure BDA0002533892620000136
wherein:
LayerNorm (·) is a normalization function;
attention (. cndot.) is the self-Attention mechanism,
Figure BDA0002533892620000137
dkdimension of the sentence to be translated;
Figure BDA0002533892620000138
the three vector parameters obtained by the i-1 th decoder module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are obtained by training the last module in the encoder;
Figure BDA0002533892620000139
the output value of the i-1 th decoder module DSL2 sub-layer is the multi-layer semantic feature information when i is equal to 1.
Further, the calculation formula of the DSL1 sublayer of the i-th decoder module is:
Figure BDA0002533892620000141
wherein:
KE,VEthe parameters obtained for the last module of the encoder.
The calculation formula of the DSL2 sublayer of the i-th decoder module is:
Figure BDA0002533892620000142
Figure BDA0002533892620000143
wherein:
FC (-) is a feed-forward fully-connected network;
WDtraining weights for a preset decoder;
bDis a preset decoder bias parameter.
The invention sequentially carries out the calculation output of the three modules of the decoder, and takes the output result of the DSL2 sublayer in the last module as the translation target language sequence T _ Pred.
S4, inputting the translation target language sequence into a pre-trained discrimination model D, judging the translation target language sequence by the discrimination model D, if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the translation target language sequence, otherwise, updating parameters of the ATransformer model based on a strategy gradient algorithm, inputting the preprocessed sentence to be translated into an updated ATransformer encoder, and realizing the machine translation algorithm again.
Furthermore, the translation target language sequence T _ Pred is input into a pre-trained discrimination model D, the discrimination model D judges the translation target language sequence T _ Pred, and the discrimination model D is a convolution neural network model;
the training process of the discriminant model D is as follows:
(1) for a source language sentence S and a reference translation T in a training set, word vectors in the S and the T are spliced into a two-dimensional matrix for representation in a splicing mode
Figure BDA0002533892620000144
And referring to the jth word in the translation T
Figure BDA0002533892620000145
The following feature maps, which are the positive sample inputs of the discriminant model D, may be obtained:
Figure BDA0002533892620000146
(2) inputting a source language sentence S into an ATransformer model to obtain a translation target language sequence T _ p, splicing the source language sentence S and the T _ p, and inputting a splicing result as a negative sample so as to set an optimization target function of a discrimination model D:
maxV(D)=logD((S,T))+log(1-D(ATransformer(S,Tp)))
(3) performing convolution operation on currently obtained positive and negative sample inputs by adopting a convolution kernel of 5 x 5 to obtain positive and negative sample characteristics, wherein an activation function sigma of the convolution operation is a softmax function, and a formula of the convolution operation is as follows:
Fi,j=σ(WF*fi,j+bF)
wherein:
WFtraining weights for preset convolutional layers;
bFis a predetermined convolutional layer bias parameter.
(4) And obtaining probability distribution of positive and negative sample categories by using a sigmoid function at the full connection layer, if the probability difference of the positive and negative sample categories is less than the error rate set by the method and the optimization target function reaches the maximum, considering that the discriminant model is successfully trained, and otherwise, performing convolution operation again.
Further, if the translation target language sequence T _ Pred is determined to be a translation result, the translation target language sequence T _ Pred is output as a final machine translation result; otherwise, the invention updates the parameters of the ATransformer model based on the strategy gradient algorithm, inputs the preprocessed sentence to be translated into the updated ATransformer coder, and implements the machine translation algorithm again;
because the translation result generated by the ATransformer model is not a continuous value, an error signal generated in the discrimination model cannot be transmitted to the ATransformer model, so that the parameter updating is performed by adopting a strategy gradient algorithm, and the process of performing the parameter updating on the ATransformer model based on the strategy gradient algorithm comprises the following steps:
1) according to the source language sentence S and the model D, the invention provides the following loss function to train the ATransformer model:
Loss=log(1-D(S,T_Pred))
wherein:
d is a discrimination model;
and T _ Pred is a translation target language sequence output by the ATransformer model according to the source language sentence S.
2) Performing gradient calculation on the parameters of the ATransformer model needing to be updated, and updating the parameters of the ATransformer model by using a gradient descending mode according to the gradient obtained after calculation, so as to complete training and optimization of the ATransformer model, wherein the gradient descending mode is the prior art and is not described herein, and the gradient calculation is as follows:
Figure BDA0002533892620000162
wherein:
ATransformer (. DELTA.S) is the distribution of conditions generated by the ATransformer model;
theta is a parameter of the ATransformer model.
The following describes the embodiments of the present invention through a simulation experiment, and tests the inventive algorithm. The training and testing of the machine translation algorithm of the invention adopt a deep learning frame Pythrch, all models are trained on 6 NVIDIAK80 GPUs, each GPU is allocated with 4000 tokens, and a newtest2017 data set is used as a test set, and meanwhile, the selected comparison model of the invention comprises a transform model trained by a traditional method, a statistical machine translation model RNN-embed based on a deep neural network and a statistical machine translation model NNPR based on a neural network.
In order to verify the effectiveness of the algorithm provided by the invention, the accuracy of the translation result T _ Pred of each model is analyzed by using a machine translation evaluation index BLEU; for a source language sentence SiThe translation result is T _ PrediThe reference translation of the corresponding target language is Ti={Ti1,…,TimN-grams is a set of phrases with length n, let wkRepresents possible n-grams, h in the kth groupk(T_Predi) Denotes wkAt T _ PrediNumber of occurrences in, hk(Tij) Denotes wkIn the reference translation TijIf the number of times of occurrence in the translation is less than the number of times of occurrence in the translation, the calculation formula of the BLEU superposition accuracy between the translation result and the reference translation is as follows:
Figure BDA0002533892620000161
according to the simulation experiment result, the Transformer model trained by the traditional method completes the translation of the data set in 8 hours, and the BLEU of the model is equally 18; the RNN-embed machine translation model completes the translation of the data set in 7 hours, and the BLEU of the model is 25; the NNPR machine translation model completes the translation of the data set in 9 hours, and the BLEU of the NNPR machine translation model is 27; the machine translation model based on layer aggregation completes the translation of the data set within 5 hours, and the BLEU of the translation model is equal to 32. Therefore, compared with the existing algorithm, the machine translation algorithm can finish the translation of the text more quickly, and has higher translation precision.
The invention also provides a device of the machine translation algorithm based on layer aggregation. Referring to fig. 2, a schematic diagram of an internal structure of an apparatus for a layer aggregation based machine translation algorithm according to an embodiment of the present invention is provided.
In this embodiment, the apparatus 1 for layer aggregation based machine translation algorithm at least includes a text obtaining module 11, an encoding module 12, a decoding module 13, a translation result judging module 14, and a communication bus 15.
The text acquiring module 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, and a portable Computer, or a server.
The encoding module 12, which may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip, is used to perform encoding operation on the preprocessed chinese sentence by using the ATransformer encoder and perform multi-layer semantic feature information extraction on the chinese sentence.
The decoding module 13 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is configured to decode the multiple layers of semantic feature information by using the ATransformer encoder, so as to output the translation target language sequence.
The translation result discrimination module 14 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type model training module (e.g., SD or DX model training module, etc.), a magnetic model training module, a magnetic disk, an optical disk, and the like. The translation result discrimination module 14 may in some embodiments be an internal storage unit of the apparatus 1 for layer aggregation based machine translation algorithms, for example a hard disk of the apparatus 1 for layer aggregation based machine translation algorithms. In other embodiments, the translation result determination module 14 may also be an external storage device of the device system 1 based on the layer-aggregation machine translation algorithm, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, which are equipped on the device 1 based on the layer-aggregation machine translation algorithm. Further, the translation result discrimination module 14 may also include both an internal storage unit and an external storage device of the apparatus 1 based on the layer-aggregation machine translation algorithm. The translation result discrimination module 14 may be configured to store not only application software installed in the device 1 based on the layer-aggregation machine translation algorithm and various types of data, such as a model training program instruction, but also temporarily store data that has been output or is to be output.
The communication bus 15 is used to realize connection communication between these components.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and an optional user interface which may also comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the apparatus 1 based on the layer-aggregation machine translation algorithm and for displaying a visualized user interface.
Fig. 2 only shows the apparatus 1 with the components 11-15 and the layer aggregation based machine translation algorithm, and it will be understood by those skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the apparatus 1 of the layer aggregation based machine translation algorithm, and may comprise fewer or more components than those shown, or combine certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the translation result determining module 14 stores a model training program instruction of the ATransformer model; the process of the device executing the layer aggregation based machine translation algorithm is the same as the process of executing the layer aggregation based machine translation algorithm, and the description is not repeated here.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a model training program instruction of an ATransformer model is stored on the computer-readable storage medium, where the model training program instruction is executable by one or more processors to implement the following operations:
acquiring a Chinese sentence to be translated, and preprocessing the Chinese sentence;
utilizing an ATransformer encoder to perform encoding operation on the preprocessed Chinese sentences, and performing multi-layer semantic feature information extraction on the Chinese sentences;
decoding the multilayer semantic feature information by using an ATransformer encoder, thereby outputting a translation target language sequence;
judging a translation target language sequence, and if judging that the translation target language sequence is a translation result, outputting the translation target language sequence as a final machine translation result; otherwise, updating parameters of the coding module and the decoding module based on the strategy gradient algorithm, inputting the preprocessed sentence to be translated into the updated coding module, and realizing the machine translation algorithm again.
The embodiment of the computer-readable storage medium of the present invention is substantially the same as that of the above-mentioned machine translation algorithm based on layer aggregation, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or used directly or indirectly in other related fields, are included in the scope of the present invention.

Claims (9)

1. A layer aggregation based machine translation algorithm, the method comprising:
acquiring a Chinese sentence to be translated, and performing text preprocessing operation on the Chinese sentence;
inputting the preprocessed statement into a preset ATransformer encoder, wherein the ATransformer encoder performs multilayer semantic feature information extraction on the statement based on a multilayer information extraction algorithm;
inputting the multilayer semantic feature information into a preset ATransformer decoder, wherein the preset ATransformer decoder decodes the multilayer semantic feature information and outputs a translation target language sequence;
inputting a translation target language sequence into a pre-trained discrimination model D, and judging the translation target language sequence by the discrimination model D;
and if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the translation result, otherwise, updating parameters of the ATransformer model based on a strategy gradient algorithm, inputting the preprocessed statement to be translated into an updated ATransformer encoder, and realizing the machine translation algorithm again.
2. The layer aggregation-based machine translation algorithm of claim 1, wherein the text pre-processing operation comprises:
matching the constructed stop word list with words in the text data one by one, and deleting the words if the matching is successful;
finding out all possible words in the word string by constructing a prefix dictionary and a self-defined dictionary; and
and according to all the found possible words, each word corresponds to one directed edge in the graph, a corresponding side length weight is given, then for the segmentation graph, in all paths from the starting point to the end point, the path sets of which the length values are 1 st, 2 nd, … th, i th, … th and N th in sequence in a strict ascending order are solved to be used as corresponding rough segmentation result sets, and the rough segmentation result sets are segmentation result sets of the Chinese sentences to be translated.
3. The layer aggregation-based machine translation algorithm of claim 2, wherein the ATransformer encoder performs multi-layer semantic feature information extraction on a sentence based on a multi-layer information extraction algorithm, comprising:
the ATransformer encoder is formed by stacking 6 same modules, each module is divided into a main layer and a merging layer, each main layer is provided with two sub-layers, the first sub-layer contains a self-attention mechanism, the second sub-layer is a fully-connected feedforward network layer, the merging layer merges sub-layer results by using a Joint function, and the merged result is used as an output value of the next module;
the main layer of the first module in the ATransformer coder receives the preprocessed sentence to be translated, and the first sub-layer in the main layer
Figure FDA0002533892610000021
Calculating the statement to be translated based on a self-attention mechanism, and inputting the calculation result into a second sublayer in the main layer
Figure FDA0002533892610000022
Second sub-layer
Figure FDA0002533892610000023
Residual error connection is carried out on the output result based on a feedforward full-connection neural network; a merging layer in the module merges the output results of the two sub-layers by using a Joint function, and takes the merged result as an input value of the next module;
other modules in the ATransformer coder sequentially receive the output value of the previous module, the output value of the previous module is used as the input value of the current module for calculation and output, the output value of the last module is the extracted multilayer semantic feature information, the ATransformer coder extracts 12 layers of feature information and performs feature fusion, and the whole network structure is combined into a deeper hierarchical structure and a larger hierarchical structure from the shallowest structure and the smallest structure in an iterative manner;
the sub-layer
Figure FDA0002533892610000024
The calculation formula based on the self-attention mechanism in (1) is as follows:
Figure FDA0002533892610000025
wherein:
LayerNorm (·) is a normalization function;
i is the ith module in the encoder;
attention (. cndot.) is the self-Attention mechanism,
Figure FDA0002533892610000026
dkdimension of the input sentence to be translated;
Figure FDA0002533892610000027
the three vector parameters obtained by the i-1 th module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are preset in the invention;
Figure FDA0002533892610000028
when i is 1, the output value of the second sub-layer in the i-1 th module main layer is the preprocessed sentence to be translated;
the sub-layer
Figure FDA0002533892610000029
The calculation formula of residual error connection based on the feedforward fully-connected neural network in the method is as follows:
Figure FDA0002533892610000031
Figure FDA0002533892610000032
wherein:
FC (-) is a feed-forward fully-connected network;
WEtraining weights for a preset encoder;
bEis a preset encoder bias parameter.
4. The layer aggregation based machine translation algorithm of claim 3, wherein the preset ATransformer decoder decodes the multi-layer semantic feature information and outputs a translation target language sequence, comprising:
the ATransformer decoder is formed by stacking 3 same modules, wherein each module is divided into a multi-head attention mechanism layer, a DSL1 sublayer and a DSL2 sublayer;
a first module of the ATransformer decoder receives multilayer semantic feature information, a multi-head attention mechanism layer, a DSL1 sublayer and a DSL2 sublayer in the module sequentially carry out output calculation on the multilayer semantic feature information, and an output result of the DSL2 sublayer is used as the input of the next module;
other modules in the ATransformer decoder receive the output value of the previous module, the output value of the previous module is used as the input value of the current module for calculation and output, and the output value of the DSL2 sublayer in the last module is the translation target language sequence obtained through final translation;
multiple attention mechanism layers for the decoder module
Figure FDA0002533892610000033
The calculation formula of (2) is as follows:
Figure FDA0002533892610000034
wherein:
LayerNorm (·) is a normalization function;
attention (. cndot.) is the self-Attention mechanism,
Figure FDA0002533892610000035
dkdimension of the sentence to be translated;
Figure FDA0002533892610000036
the three vector parameters obtained by the i-1 th decoder module respectively represent query of a sentence to be translated, keys and weights of values, and when i is 1, the three vector parameters are obtained by training the last module in the encoder;
Figure FDA0002533892610000041
the output value of the i-1 th decoder module DSL2 sub-layer is, when i is equal to 1, the information is the multilayer semantic feature information;
the calculation formula of the DSL1 sublayer of the decoder module is:
Figure FDA0002533892610000042
wherein:
i is the ith module;
KE,VEparameters obtained by the last module of the encoder;
the calculation formula of the DSL2 sublayer of the decoder module is:
Figure FDA0002533892610000043
Figure FDA0002533892610000044
wherein:
FC (-) is a feed-forward fully-connected network;
WDtraining weights for a preset decoder;
bDbiasing parameters for a preset decoder;
the output result of the DSL2 sublayer in the last module is taken as the translation target language sequence T _ Pred.
5. The layer aggregation-based machine translation algorithm of claim 4, wherein the discriminant model D is trained by:
(1) splicing the source language sentences S and the reference translation T in the training set into a two-dimensional matrix representation in a splicing mode, and inputting a splicing result as a positive sample;
(2) inputting a source language sentence S into an ATransformer model to obtain a translation target language sequence T _ p, splicing the source language sentence S and the T _ p, and inputting a splicing result as a negative sample, thereby setting an optimization target function of a discrimination model D:
maxV(D)=logD((S,T))+log(1-D(ATransformer(S,T_p)))
(3) performing convolution operation on currently obtained positive and negative sample input to obtain positive and negative sample characteristics, wherein an activation function sigma of the convolution operation is a softmax function, and a formula of the convolution operation is as follows:
Fi,j=σ(WF*fi,j+bF)
wherein:
WFtraining weights for preset convolutional layers;
bFis a preset convolution layer bias parameter;
fi,jinputting positive and negative samples;
(4) and obtaining probability distribution of positive and negative sample categories by using a sigmoid function at the full connection layer, if the probability difference of the positive and negative sample categories is smaller than the error rate set by the method and the optimization target function reaches the maximum, considering that the discriminant model is successfully trained, and otherwise, performing convolution operation again.
6. The layer aggregation based machine translation algorithm of claim 5, wherein the policy gradient based algorithm parameter updating the ATransformer model comprises:
1) the following loss function was proposed to train the ATransformer model:
Loss=log(1-D(S,T_Pred))
wherein:
d is a discrimination model;
s is a source language sentence;
t _ Pred is a translation target language sequence output by the ATransformer model according to a source language sentence S;
2) performing gradient calculation on the parameters of the ATransformer model needing to be updated, and updating the parameters of the ATransformer model by using a gradient descending mode by using the gradient obtained after calculation, so as to finish training and optimization of the ATransformer model, wherein the gradient is calculated as follows:
Figure FDA0002533892610000051
wherein:
ATransformer (. DELTA.S) is the distribution of conditions generated by the ATransformer model;
and theta is the parameters of the ATransformer model, and comprises model weight, bias parameters and learning rate.
7. An apparatus for layer aggregation based machine translation algorithm, the apparatus comprising: text acquisition module, coding module, decoding module, translation result discrimination module, wherein:
the text acquisition module is used for acquiring the Chinese sentences to be translated and preprocessing the Chinese sentences;
the encoding module is used for encoding the preprocessed Chinese sentences by using an ATransformer encoder and extracting multilayer semantic feature information of the Chinese sentences;
the decoding module is used for decoding the multilayer semantic feature information by using an ATransformer encoder so as to output a translation target language sequence;
and the translation result judging module is used for judging the translation target language sequence.
8. The apparatus of layer aggregation based machine translation algorithm of claim 7, the translation result discrimination module comprising:
judging the translation target language sequence by the discrimination model;
if the translation target language sequence is judged to be a translation result, taking the translation target language sequence as a final machine translation result and outputting the final machine translation result;
otherwise, updating parameters of the coding module and the decoding module based on the strategy gradient algorithm, inputting the preprocessed sentence to be translated into the updated coding module, and realizing the machine translation algorithm again.
9. A computer readable storage medium having stored thereon model training program instructions of an ATransformer model, the model training program instructions being executable by one or more processors to implement the steps of a layer aggregation based machine translation algorithm of any of claims 1 to 6.
CN202010527099.6A 2020-06-11 2020-06-11 Machine translation algorithm and device based on layer aggregation Withdrawn CN111680529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010527099.6A CN111680529A (en) 2020-06-11 2020-06-11 Machine translation algorithm and device based on layer aggregation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010527099.6A CN111680529A (en) 2020-06-11 2020-06-11 Machine translation algorithm and device based on layer aggregation

Publications (1)

Publication Number Publication Date
CN111680529A true CN111680529A (en) 2020-09-18

Family

ID=72454558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010527099.6A Withdrawn CN111680529A (en) 2020-06-11 2020-06-11 Machine translation algorithm and device based on layer aggregation

Country Status (1)

Country Link
CN (1) CN111680529A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269610A (en) * 2020-10-26 2021-01-26 南京燚麒智能科技有限公司 Method and device for executing batch model algorithm
CN112364665A (en) * 2020-10-11 2021-02-12 广州九四智能科技有限公司 Semantic extraction method and device, computer equipment and storage medium
CN112395408A (en) * 2020-11-19 2021-02-23 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN113591498A (en) * 2021-08-03 2021-11-02 北京有竹居网络技术有限公司 Translation processing method, device, equipment and medium
WO2024207863A1 (en) * 2023-04-03 2024-10-10 腾讯科技(深圳)有限公司 Training method for translation model, text translation method, and apparatus, device and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364665A (en) * 2020-10-11 2021-02-12 广州九四智能科技有限公司 Semantic extraction method and device, computer equipment and storage medium
CN112269610A (en) * 2020-10-26 2021-01-26 南京燚麒智能科技有限公司 Method and device for executing batch model algorithm
CN112395408A (en) * 2020-11-19 2021-02-23 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN112395408B (en) * 2020-11-19 2023-11-07 平安科技(深圳)有限公司 Stop word list generation method and device, electronic equipment and storage medium
CN113591498A (en) * 2021-08-03 2021-11-02 北京有竹居网络技术有限公司 Translation processing method, device, equipment and medium
CN113591498B (en) * 2021-08-03 2023-10-03 北京有竹居网络技术有限公司 Translation processing method, device, equipment and medium
WO2024207863A1 (en) * 2023-04-03 2024-10-10 腾讯科技(深圳)有限公司 Training method for translation model, text translation method, and apparatus, device and medium

Similar Documents

Publication Publication Date Title
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
Liu et al. A recursive recurrent neural network for statistical machine translation
CN111680529A (en) Machine translation algorithm and device based on layer aggregation
CN113011533A (en) Text classification method and device, computer equipment and storage medium
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
WO2021051513A1 (en) Chinese-english translation method based on neural network, and related devices thereof
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
WO2021056710A1 (en) Multi-round question-and-answer identification method, device, computer apparatus, and storage medium
CN111414481A (en) Chinese semantic matching method based on pinyin and BERT embedding
CN114429132B (en) Named entity identification method and device based on mixed qualification self-attention network
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
CN115221846A (en) Data processing method and related equipment
CN113220890A (en) Deep learning method combining news headlines and news long text contents based on pre-training
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
CN113204611A (en) Method for establishing reading understanding model, reading understanding method and corresponding device
CN110895559A (en) Model training method, text processing method, device and equipment
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN114997181A (en) Intelligent question-answering method and system based on user feedback correction
CN115759099A (en) Chinese named entity recognition method and related equipment integrating knowledge graph embedding
CN115545030A (en) Entity extraction model training method, entity relation extraction method and device
CN116226357A (en) Document retrieval method under input containing error information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200918

WW01 Invention patent application withdrawn after publication