CN114881051A - Translation quality determination method, related device and computer program product - Google Patents
Translation quality determination method, related device and computer program product Download PDFInfo
- Publication number
- CN114881051A CN114881051A CN202210507372.8A CN202210507372A CN114881051A CN 114881051 A CN114881051 A CN 114881051A CN 202210507372 A CN202210507372 A CN 202210507372A CN 114881051 A CN114881051 A CN 114881051A
- Authority
- CN
- China
- Prior art keywords
- corpus
- information
- translation
- sentence
- key information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 174
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000004590 computer program Methods 0.000 title claims abstract description 17
- 238000011156 evaluation Methods 0.000 claims abstract description 68
- 238000012545 processing Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 9
- 239000000463 material Substances 0.000 claims description 5
- 230000001131 transforming effect Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013441 quality evaluation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The disclosure provides a translation quality determination method, a translation quality determination device, an electronic device, a computer readable storage medium and a computer program product, and relates to the technical field of artificial intelligence such as natural language processing, machine translation and deep learning. One embodiment of the method comprises: the method comprises the steps of obtaining a first corpus and a second corpus which are different in language and consistent in semantic meaning, replacing key information in the second corpus with query words, building a third corpus of which a sentence pattern is a query sentence, generating a fourth corpus which corresponds to the semantic meaning of the first corpus and is identical in language with the second corpus by using a translation model, determining first result information corresponding to a query in the third corpus in the fourth corpus, and generating first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information. The embodiment provides a translation quality determination method, and the translation quality of a translation model is evaluated based on the comprehension capability of a semantic level.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as natural language processing, machine translation, deep learning, and in particular, to a translation quality determination method, apparatus, electronic device, computer-readable storage medium, and computer program product.
Background
The simultaneous interpretation system and the instant translation system have high requirements on translation quality, so the evaluation of the simultaneous interpretation and translation system is always a difficult problem, and the evaluation system needs to simultaneously consider the problems of context consistency, continuity, translation fidelity with the original text and the like to produce a reasonable evaluation on each simultaneous interpretation result.
In the prior art, a classical text translation Evaluation method, namely a Bilingual Evaluation iteration (BLEU) method is mostly adopted for Evaluation, the similarity between a translation result and a manually marked standard translation result is calculated, and after a numerical value between 0 and 1 is returned, the numerical value is utilized to achieve the purpose of evaluating the translation result.
Disclosure of Invention
The embodiment of the disclosure provides a translation quality determination method and device, electronic equipment, a computer readable storage medium and a computer program product.
In a first aspect, an embodiment of the present disclosure provides a translation quality determining method, including: acquiring a first corpus and a second corpus which are different in language and consistent in semantic; replacing key information in the second corpus with question words to construct a third corpus with a sentence pattern as a question sentence; processing the first corpus by using a translation model to generate a fourth corpus corresponding to the first corpus, wherein the translation model is used for converting the corpus among different languages, and the language of the fourth corpus is the same as that of the second corpus; determining first result information corresponding to the query point in the third corpus in the fourth corpus; and generating first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information.
In a second aspect, an embodiment of the present disclosure provides a translation quality determining apparatus, including: the corpus acquiring unit is configured to acquire a first corpus and a second corpus which are different in language and consistent in semantic meaning; the query corpus construction unit is configured to replace the key information in the second corpus with query words and construct a third corpus with a sentence pattern of a query sentence; a corpus translation unit configured to process the first corpus to generate a fourth corpus corresponding to the first corpus by using a translation model, wherein the translation model is used for transforming the corpus between different languages, and the fourth corpus is the same as the second corpus; a first result information generating unit configured to determine, in the fourth corpus, first result information corresponding to a query point in the third corpus; a first evaluation information generation unit configured to generate first evaluation information for evaluating translation quality of the translation model based on a similarity of the key information and the first result information.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the translation quality determination method as described in any one of the implementations of the first aspect when executed.
In a fourth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the translation quality determination method as described in any one of the implementations of the first aspect when executed.
In a fifth aspect, the embodiments of the present disclosure provide a computer program product comprising a computer program, which when executed by a processor is capable of implementing the translation quality determination method as described in any one of the implementations of the first aspect.
The translation quality determining method, the translation quality determining device, the electronic device, the computer-readable storage medium, and the computer program product provided by the embodiments of the present disclosure acquire a first corpus and a second corpus which are different in language and have the same semantic meaning, replace key information in the second corpus with query words, construct a third corpus of which a sentence pattern is a question sentence, generate a fourth corpus corresponding to the semantic meaning of the first corpus and having the same language type as the second corpus by using a translation model, determine first result information corresponding to the question in the third corpus in the fourth corpus, and generate first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information.
The method comprises the steps of constructing a third corpus with a sentence pattern as a question sentence based on a mode of replacing key information as a question word from a second corpus corresponding to the semantics of the first corpus, processing the first corpus by using a translation model to obtain a fourth corpus with the same language type as the second corpus, and evaluating the translation quality of the translation model based on the similarity between the key information and first result information corresponding to the question word in the third corpus, so that the translation quality evaluation of the translation model on the semantic level is realized.
The statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;
fig. 2 is a flowchart of a translation quality determination method provided by an embodiment of the present disclosure;
fig. 3 is a flowchart of another translation quality determination method provided by the embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a translation quality determination method in an application scenario according to an embodiment of the present disclosure;
fig. 5 is a block diagram illustrating a translation quality determining apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device adapted to execute the translation quality determining method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.
In addition, in the technical scheme related to the disclosure, the processing of acquiring, storing, using, processing, transporting, providing, disclosing and the like of the personal information of the related user all accords with the regulations of related laws and regulations, and does not violate the good custom of the public order.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the translation quality determination methods, apparatus, electronic devices, and computer-readable storage media of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications for realizing information communication between the terminal devices 101, 102, 103 and the server 105, such as a translation quality test application, a model remote debugging application, an instant messaging application, etc., may be installed on the terminal devices 101, 102, 103 and the server 105.
The terminal apparatuses 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal devices 101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.
The server 105 may provide various services through various built-in applications, taking a translation quality test application which may provide translation capability and quality detection as an example, the server 105 may implement the following effects when running the translation quality test application: firstly, acquiring a first corpus and a second corpus which are different in language and consistent in semantic from terminal equipment 101, 102 and 103 through a network 104; then, the server 105 replaces the key information in the second corpus with question words to construct a third corpus with a sentence pattern as a question sentence; next, the server 105 processes the first corpus to generate a fourth corpus corresponding to the first corpus by using a translation model, wherein the translation model is used for transforming the corpus between different languages, and the fourth corpus is the same as the second corpus; further, the server 105 determines first result information corresponding to the query point in the third corpus in the fourth corpus; finally, the server 105 generates first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information.
It should be noted that the first corpus and the second corpus, which are different in language and have the same semantic meaning, may be pre-stored locally in the server 105 in various ways, in addition to being acquired from the terminal devices 101, 102, 103 through the network 104. Thus, when the server 105 detects that such data is already stored locally (e.g., a task of evaluating translation quality of a translation model that persists before starting processing), it may choose to retrieve such data directly from locally, in which case the exemplary system architecture 100 may also not include the terminal devices 101, 102, 103 and the network 104.
Since the linguistic data processed by the translation model needs to occupy more computation resources and stronger computation power, the translation quality determination method provided in the following embodiments of the present disclosure is generally executed by the server 105 having stronger computation power and more computation resources, and accordingly, the translation quality determination device is generally disposed in the server 105. However, it should be noted that when the terminal devices 101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the terminal devices 101, 102, and 103 may also complete the above-mentioned operations performed by the server 105 through the translation quality test application installed thereon, and then output the same result as the server 105. Particularly, when there are a plurality of types of terminal devices having different computation capabilities at the same time, but the translation quality test application determines that the terminal device has a strong computation capability and a large amount of computation resources are left, the terminal device may execute the above computation to appropriately reduce the computation load of the server 105, and accordingly, the translation quality determination device may be provided in the terminal devices 101, 102, and 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 is a flowchart of a translation quality determining method according to an embodiment of the present disclosure, where the process 200 includes the following steps:
In this embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the translation quality determining method obtains a first corpus and a second corpus which are different in language and consistent in semantic, for example, when the language of the first corpus is chinese and the language of the second corpus is english, the first corpus may be "an apple on the table", and the corresponding second corpus may be "the re is an apple on the table", it should be understood that, the content corresponding to the second corpus is usually generated manually or based on a translation model, a translator, and the like, which meet a requirement for reliability, and may be used as a standard corpus for training the model.
It should be noted that the first corpus and the second corpus may be obtained by the execution main body directly from a local storage device, or may be obtained from a non-local storage device (for example, terminal devices 101, 102, 103 shown in fig. 1). The local storage device may be a data storage module, such as a server hard disk, disposed in the execution main body, in which case the first corpus and the second corpus may be read locally and quickly; the non-local storage device may also be any other electronic device configured to store data, such as some user terminals, in which case the executing entity may obtain the required first corpus and second corpus by sending a obtaining command to the electronic device.
In this embodiment, after the second corpus is obtained, at least one piece of key information is determined from the second corpus, and the key information is replaced with a query word correspondingly, the included "? "(question mark) to construct a third corpus with sentence pattern as question sentence, that is, the constructed third corpus has the same content as the second corpus except the question words replaced based on the key information in the second corpus.
When the second corpus is a sentence set composed of a plurality of sentences, after key information can be respectively determined from each sentence, construction of interrogative sentences can be respectively completed, a third corpus including a plurality of interrogative sentences (embodied as the interrogative sentence set) can be obtained, and partial sentences for constructing the interrogative sentences can also be determined from each sentence, which is not limited by the disclosure.
Furthermore, each content in the second corpus may be split according to sentence components, that is, each content in the second corpus may be split according to subject, predicate, object, noun, verb, and the like, so as to quickly implement the construction of the third corpus by pre-configuring the query words corresponding to different sentence components, for example, when determining that the sentence component of the key information in the second corpus is a noun, the key information is replaced according to the pre-configured query word "What" corresponding to the noun, and the third corpus with the sentence pattern as the query sentence is constructed.
In this embodiment, a translation model to be determined and to evaluate the translation quality is obtained, and the translation model is used to process the first corpus so as to generate a result of translating the first corpus to a language corresponding to the second corpus with the translation model, and the result is determined as the fourth corpus.
The translation model is used for translating between at least two languages, that is, for example, translating a first language material into a language corresponding to a second language material.
And 204, determining first result information corresponding to the query point in the third corpus in the fourth corpus.
In this embodiment, the fourth corpus is used as an answer or basis to search for each query point in the third corpus, that is, the content referred to and queried by the query word, and the determined content referred to and queried by the query word is used as the first result information corresponding to the query point in the third corpus.
In practice, a pre-configured question-answer model can be utilized, the content in the fourth corpus is used as the reference of the question-answer model, the third corpus is used as a question to be input, and a question-answer result obtained through the question-answer model is used as first result information for determining the question point in the third corpus.
And step 205, generating first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information.
In this embodiment, after the key information determined in the step 202 and the first result information obtained in the step 204 are obtained, the first evaluation information for evaluating the translation quality of the translation model is generated based on the similarity between the key information and the first result information, so that the difference between the translation result based on the translation model and the standard result is known based on the similarity included in the first evaluation information, wherein the similarity may be generated by directly comparing the literal similarity between the key information and the first result information, or may be generated by converting the key information and the first result information into corresponding feature vectors and then using the feature distance and the cosine distance between the feature vectors.
Further, an information length allowing the similarity to be obtained through the literal similarity may be preconfigured, so that after the key information and the first result information are obtained, when the literal lengths of the key information and the first result information both meet the requirement of the information length, the similarity between the key information and the first result information is obtained in a literal similarity manner, and when the literal length of at least one of the key information and the first result information does not meet the requirement of the information length, the similarity between the key information and the first result information is obtained in a feature vector manner, so as to balance the obtaining efficiency and the obtaining quality of the similarity between the key information and the first result information in different application scenarios.
In practice, a plurality of evaluation levels corresponding to different similarity threshold intervals can be set, so that after the similarity between the key information and the first result information is obtained, the corresponding evaluation level is determined based on the similarity threshold interval in which the similarity falls, and the first evaluation information is generated based on the evaluation level, so that the translation quality of the translation model can be reflected more intuitively.
The translation quality determining method provided by the embodiment of the present disclosure constructs a third corpus with a sentence pattern as a question sentence based on a manner of replacing key information as a query word from a second corpus corresponding to the semantics of a first corpus, and evaluates translation quality of a translation model based on similarity between the key information and first result information determined in the fourth corpus and corresponding to the query word in the third corpus after processing the first corpus by using the translation model to obtain a fourth corpus with the same language type as the second corpus, thereby implementing translation quality evaluation of a semantic translation model.
In some optional implementation manners of this embodiment, the method for determining translation quality further includes: responding to that the similarity between the key information and the first result information is lower than a preset similarity threshold, and acquiring component information of statement components made by the key information in the second corpus; fourth evaluation information is generated based on the component information.
Specifically, when the similarity between the key information and the first result information is lower than a preset similarity threshold, component information of a sentence component made by the key information in the second corpus is obtained, the similarity threshold is usually determined based on different application scenarios, when it is determined that the similarity between the key information and the first result information is lower than the preset similarity threshold, the translation quality of the translation model cannot meet an expected requirement, the component information is used for indicating the sentence component made by the key information in the second corpus, and fourth evaluation information is generated based on the component information, so that the fourth evaluation information is used for feeding back the translation capability of the translation model for each specific sentence component, and then targeted adjustment and training are performed.
Referring to fig. 3, fig. 3 is a flowchart of another translation quality determining method according to an embodiment of the present disclosure, where the process 300 includes the following steps:
In this embodiment, at least first key information may be determined from the first corpus, so that second key information corresponding to the first key information in the second corpus is subsequently determined based on the first key information in a manner of sentence alignment, sentence component splitting, and the like, thereby avoiding problems of inappropriate key information determination, low quality, and the like caused by the fact that sentence components cannot strictly correspond due to different languages, and improving the quality of the determined key information.
In practice, punctuation in the language of the first corpus, punctuation in the language of the sentence-break usage rule, punctuation in the language of the first-level second corpus, and sentence-break usage rule can be configured in advance, so that sentence alignment between the first corpus and the second corpus can be better realized, and after the sentences are split and aligned, sentences obtained by splitting form a sentence set, so that the translation capability of the translation model for the sentences obtained by splitting can be detected, and the determination accuracy of translation quality influenced by information omission caused by the difference between the punctuation and the sentence-break usage rules can be avoided.
When determining second key information corresponding to the first key information in the second corpus based on the first key information by means of sentence alignment, sentence component splitting and the like,
In this embodiment, after determining that there is the first key information in the first corpus based on the step 302, the second key information in the second corpus corresponding to the first key information is determined from the second corpus, and the third corpus having the sentence pattern as the question sentence is constructed based on the second key information.
The steps 301 and 304 correspond to the step 201 and 205 shown in fig. 2, and for the same contents, reference is made to the corresponding parts of the previous embodiment, which is not repeated here, and in this embodiment, on the basis of the embodiment shown in fig. 2, after determining the first key information based on the first corpus, the second key information is determined in the second corpus corresponding to the first key information, so as to avoid the problems of inappropriate key information determination, low quality and the like caused by the fact that the statement components cannot be strictly corresponding due to different languages, and improve the quality of the determined key information.
In some optional implementation manners of this embodiment, the method for determining translation quality further includes: generating second result information which has the same semantic meaning as the first result information and has the language of the first language material; and generating second evaluation information for evaluating the translation quality of the translation model based on the similarity between the first key information and the second result information.
Specifically, the first result may be processed based on a translation model and a translator which are manually or with confidence level meeting requirements, so as to generate second result information which has the same semantic meaning as the first result information and is in the language of the first corpus, and the similarity between the first key information and the second result information is used to generate second evaluation information for evaluating the translation quality of the translation model, so as to feed back the translation quality of the translation model corresponding to the language of the first corpus by using the second evaluation information.
In some optional implementation manners of this embodiment, the method for determining translation quality further includes: the first evaluation information and the second evaluation information are weighted to generate third evaluation information.
Specifically, after the first evaluation information and the second evaluation information are obtained, that is, the translation quality of the translation model in the language of the first corpus and the translation quality of the second corpus are respectively determined, the first evaluation information and the second evaluation information may be weighted based on a preconfigured value ratio to generate third evaluation information for feeding back the overall translation capability of the translation model, so as to more completely and comprehensively feed back the translation quality of the translation model.
Further, corresponding evaluation information (e.g., fifth evaluation information) may be generated based on a difference between the first evaluation information and the second evaluation information, so as to feed back a difference between translation qualities of the translation model for different languages through the fifth evaluation information.
On the basis of any one of the foregoing embodiments, in some embodiments of the present disclosure, the translation quality determination method further includes: extracting a question format template corresponding to the question words from a preset question sentence library, wherein the question format template is used for indicating the arrangement sequence of each sentence component in the question sentences; and adjusting the arrangement sequence of each statement component in the third corpus based on the doubtful sentence format template to generate an optimized third corpus.
Specifically, after the key information in the second corpus is replaced with query words and a third corpus with a sentence pattern as a query sentence is constructed, a query sentence format template corresponding to the query word may be extracted from a pre-configured query sentence library based on the query word determined to be used, the query sentence may correspond to various query sentence format templates corresponding to different query words (specific content of the query word, sentence components made) in the query sentence, the query sentence format template is determined based on language usage habits in a real scene, the query sentence format template is used to adjust the arrangement order of the sentence components in the third corpus, and an optimized third corpus with the arrangement order of the sentence components closer to the real scene is generated, for example, after the "What" is used to replace the table language "table" of the key information and the sentence component in the "This a table", the obtained third corpus is "This is a what? "is the third corpus optimized to" What is this? And adjusting the generated third corpus through the questioning sentence format template to obtain the questioning sentences closer to the real scene, so that the translation quality interference caused by the difference between the arrangement sequence of the sentence components and the real scene is reduced, and the accuracy of the determined translation quality is improved.
In order to deepen understanding, the method also combines a specific application scene, and provides a specific implementation scheme, and a user can pass through a gate machine through the face of the user at an identification end by using the first language material. For example, in the processing procedure after obtaining the first corpus and the second corpus, please refer to the flow 400 shown in fig. 4, which specifically includes the following steps:
first key information 'identification end' and 'gate' are determined from the first corpus, and second key information 'the registration end' and 'the gate' corresponding to the first key information are determined from the second corpus.
Then, after replacing the second key information "the retrieval end" and "the gate" in the second corpus with the query word "What", respectively, a third corpus Q1 "At What, you can pass through the gate through the low output face? "and Q2" is the expression end, you can pass through the once has through the your own face? And processing the first corpus by using the translation model to obtain a fourth corpus corresponding to the first corpus and having the same language as the second corpus, namely the In the reproduction end and the you can do by way of the you surface.
Next, the first result information corresponding to the query point in the third corpus, i.e. M1 "the query end" corresponding to Q1 and M2 "it" corresponding to Q2, are respectively determined in the fourth corpus.
Finally, first evaluation information for evaluating the translation quality of the translation model is generated based on the similarity (M1-Ans1:0.99, M2-Ans2:0.15) of the first result information and the corresponding key information (M1-Ans1, M2-Ans 2).
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a translation quality determining apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the translation quality determination apparatus 500 of the present embodiment may include: a corpus acquiring unit 501, a query corpus constructing unit 502, a corpus translating unit 503, a first result information generating unit 504, and a first evaluation information generating unit 505. The corpus acquiring unit 501 is configured to acquire a first corpus and a second corpus which are different in language and consistent in semantic; a query corpus constructing unit 502 configured to replace the key information in the second corpus with query words to construct a third corpus with a sentence pattern as a query sentence; a corpus translation unit 503 configured to process the first corpus to generate a fourth corpus corresponding to the first corpus by using a translation model, wherein the translation model is used for transforming the corpus between different languages, and the fourth corpus is the same as the second corpus; a first result information generating unit 504 configured to determine, in the fourth corpus, first result information corresponding to the query point in the third corpus; a first evaluation information generating unit 505 configured to generate first evaluation information for evaluating the translation quality of the translation model based on the similarity of the key information and the first result information.
In the present embodiment, in the translation quality determination apparatus 500: the concrete processing and the technical effects of the corpus acquiring unit 501, the query corpus constructing unit 502, the corpus translating unit 503, the first result information generating unit 504 and the first evaluation information generating unit 505 can refer to the related description of step 201 and step 205 in the embodiment corresponding to fig. 2, and are not repeated herein.
In some optional implementations of this embodiment, the query corpus building unit 502 includes: a first key information determining subunit configured to determine at least one first key information from the first corpus; and the query corpus constructing subunit is configured to replace the second key information corresponding to the first key information in the second corpus with query words to construct a third corpus with a sentence pattern as a query sentence.
In some optional implementations of this embodiment, the translation quality determining apparatus 500 further includes: a second result information generating unit configured to generate second result information having the same semantic meaning as the first result information and having a language of the first corpus; a second evaluation information generation unit configured to generate second evaluation information for evaluating the translation quality of the translation model based on a similarity of the first key information and the second result information.
In some optional implementations of this embodiment, the translation quality determining apparatus 500 further includes: and a third evaluation information generation unit configured to weight the first evaluation information and the second evaluation information to generate third evaluation information.
In some optional implementations of this embodiment, the translation quality determining apparatus 500 further includes: a component information obtaining unit configured to obtain component information of a sentence component made by the key information in the second corpus in response to a similarity between the key information and the first result information being lower than a preset similarity threshold; and a fourth evaluation information generation unit that generates fourth evaluation information based on the component information.
In some optional implementations of this embodiment, the translation quality determining apparatus 500 further includes: the sentence component ordering and acquiring unit is configured to extract a question format template corresponding to the question word from a preset question sentence library, wherein the question format template is used for indicating the arrangement sequence of each sentence component in the question sentence; and the third corpus optimizing unit is configured to adjust the arrangement sequence of each sentence component in the third corpus based on the question sentence format template to generate an optimized third corpus.
The present embodiment exists as an apparatus embodiment corresponding to the foregoing method embodiment, and the translation quality determining apparatus provided in the present embodiment constructs a third corpus having a sentence pattern as a question sentence based on a manner of replacing key information as a query word from a second corpus corresponding to the semantics of the first corpus, and evaluates the translation quality of the translation model based on a similarity between the key information and first result information determined in the fourth corpus corresponding to a query point in the third corpus after processing the first corpus by using the translation model to obtain a fourth corpus having the same language type as the second corpus, so as to implement the translation quality evaluation of the translation model in a semantic level.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the translation quality determination method. For example, in some embodiments, the translation quality determination method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the translation quality determination method described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured to perform the translation quality determination method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service. The server may also be divided into servers of a distributed system, or servers that incorporate a blockchain.
According to the technical scheme of the embodiment of the disclosure, a third corpus with a sentence pattern as a question sentence is constructed based on a mode of replacing key information as a question word from a second corpus corresponding to the semantics of the first corpus, a translation model is utilized to process the first corpus to obtain a fourth corpus with the same language type as the second corpus, and then the translation quality of the translation model is evaluated based on the similarity between the key information and first result information which is determined in the fourth corpus and corresponds to the question word in the third corpus, so that the translation quality evaluation of the translation model in the semantic level is realized.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel or sequentially or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (15)
1. A translation quality determination method, comprising:
acquiring a first corpus and a second corpus which are different in language and consistent in semantic;
replacing key information in the second corpus with question words to construct a third corpus with a sentence pattern as a question sentence;
processing the first corpus by using a translation model to generate a fourth corpus corresponding to the first corpus, wherein the translation model is used for converting the corpus among different languages, and the language of the fourth corpus is the same as that of the second corpus;
determining first result information corresponding to the query point in the third corpus in the fourth corpus;
and generating first evaluation information for evaluating the translation quality of the translation model based on the similarity between the key information and the first result information.
2. The method of claim 1, wherein the replacing key information in the second corpus with question words to construct a third corpus with a sentence pattern of question sentences comprises:
determining at least one piece of first key information from the first corpus;
and replacing second key information corresponding to the first key information in the second corpus with question words to construct a third corpus with a sentence pattern of a question sentence.
3. The method of claim 2, further comprising:
generating second result information which has the same semantic meaning as the first result information and has the language of the first language material;
and generating second evaluation information for evaluating the translation quality of the translation model based on the similarity between the first key information and the second result information.
4. The method of claim 3, further comprising:
and weighting the first evaluation information and the second evaluation information to generate third evaluation information.
5. The method of claim 1, further comprising:
responding to that the similarity between the key information and the first result information is lower than a preset similarity threshold, and acquiring component information of statement components made by the key information in the second corpus;
fourth evaluation information is generated based on the composition information.
6. The method of any of claims 1-5, further comprising:
extracting a question format template corresponding to the question words from a preset question sentence library, wherein the question format template is used for indicating the arrangement sequence of each sentence component in the question sentences;
and adjusting the arrangement sequence of each statement component in the third corpus based on the question sentence format template to generate an optimized third corpus.
7. A translation quality determination apparatus comprising:
the corpus acquiring unit is configured to acquire a first corpus and a second corpus which are different in language and consistent in semantic meaning;
the query corpus building unit is configured to replace key information in the second corpus with query words and build a third corpus with a sentence pattern as a query sentence;
a corpus translation unit configured to process the first corpus by using a translation model to generate a fourth corpus corresponding to the first corpus, wherein the translation model is used for transforming the corpus between different languages, and the fourth corpus is the same as the second corpus;
a first result information generating unit configured to determine, in the fourth corpus, first result information corresponding to a query point in the third corpus;
a first evaluation information generation unit configured to generate first evaluation information for evaluating translation quality of the translation model based on a similarity of the key information and the first result information.
8. The apparatus of claim 7, wherein the query corpus building unit comprises:
a first key information determining subunit configured to determine at least one first key information from the first corpus;
and the query corpus constructing subunit is configured to replace second key information corresponding to the first key information in the second corpus with query words to construct a third corpus with a sentence pattern as a query sentence.
9. The apparatus of claim 8, further comprising:
a second result information generating unit configured to generate second result information having the same semantic meaning as the first result information and having a language of the first corpus;
a second evaluation information generation unit configured to generate second evaluation information for evaluating the translation quality of the translation model based on a similarity of the first key information and the second result information.
10. The apparatus of claim 9, further comprising:
a third evaluation information generation unit configured to generate third evaluation information by weighting the first evaluation information and the second evaluation information.
11. The apparatus of claim 7, further comprising:
a component information obtaining unit configured to obtain component information of a sentence component made by the key information in the second corpus in response to a similarity between the key information and the first result information being lower than a preset similarity threshold;
and a fourth evaluation information generation unit that generates fourth evaluation information based on the component information.
12. The apparatus of any of claims 7-11, further comprising:
the sentence component ordering and acquiring unit is configured to extract a question format template corresponding to the question words from a preset question sentence library, wherein the question format template is used for indicating the arrangement sequence of each sentence component in the question sentences;
and the third corpus optimization unit is configured to adjust the arrangement sequence of each sentence component in the third corpus based on the question sentence format template to generate an optimized third corpus.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the translation quality determination method of any of claims 1-6.
14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the translation quality determination method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements a translation quality determination method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210507372.8A CN114881051A (en) | 2022-05-10 | 2022-05-10 | Translation quality determination method, related device and computer program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210507372.8A CN114881051A (en) | 2022-05-10 | 2022-05-10 | Translation quality determination method, related device and computer program product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114881051A true CN114881051A (en) | 2022-08-09 |
Family
ID=82675451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210507372.8A Pending CN114881051A (en) | 2022-05-10 | 2022-05-10 | Translation quality determination method, related device and computer program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114881051A (en) |
-
2022
- 2022-05-10 CN CN202210507372.8A patent/CN114881051A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210312139A1 (en) | Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium | |
KR102521765B1 (en) | Method and apparatus for determining causality, electronic device and storage medium | |
EP3971761A1 (en) | Method and apparatus for generating summary, electronic device and storage medium thereof | |
JP7413630B2 (en) | Summary generation model training method, apparatus, device and storage medium | |
CN114861889B (en) | Deep learning model training method, target object detection method and device | |
KR20210154705A (en) | Method, apparatus, device and storage medium for matching semantics | |
JP2018073411A (en) | Natural language generation method, natural language generation device, and electronic apparatus | |
KR102561951B1 (en) | Configuration method, device, electronic equipment and computer storage medium of modeling parameters | |
CN113836314B (en) | Knowledge graph construction method, device, equipment and storage medium | |
JP2023007369A (en) | Translation method, classification model training method, apparatus, device and storage medium | |
CN114757214B (en) | Selection method and related device for sample corpora for optimizing translation model | |
EP3992814A2 (en) | Method and apparatus for generating user interest profile, electronic device and storage medium | |
KR20210080150A (en) | Translation method, device, electronic equipment and readable storage medium | |
KR102608867B1 (en) | Method for industry text increment, apparatus thereof, and computer program stored in medium | |
CN113806522A (en) | Abstract generation method, device, equipment and storage medium | |
US20230342561A1 (en) | Machine translation method and apparatus, device and storage medium | |
US20230076471A1 (en) | Training method, text translation method, electronic device, and storage medium | |
CN115186738B (en) | Model training method, device and storage medium | |
US20230141932A1 (en) | Method and apparatus for question answering based on table, and electronic device | |
CN114881051A (en) | Translation quality determination method, related device and computer program product | |
CN116187301A (en) | Model generation method, entity identification device, electronic equipment and storage medium | |
CN115510203A (en) | Question answer determining method, device, equipment, storage medium and program product | |
CN114818732A (en) | Text content evaluation method, related device and computer program product | |
US20220374603A1 (en) | Method of determining location information, electronic device, and storage medium | |
US20230084438A1 (en) | Method of generating text, method of training model, electronic device, and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |