Nothing Special   »   [go: up one dir, main page]

CN113409826B - TTS system performance test method, device, equipment and medium - Google Patents

TTS system performance test method, device, equipment and medium Download PDF

Info

Publication number
CN113409826B
CN113409826B CN202110890585.9A CN202110890585A CN113409826B CN 113409826 B CN113409826 B CN 113409826B CN 202110890585 A CN202110890585 A CN 202110890585A CN 113409826 B CN113409826 B CN 113409826B
Authority
CN
China
Prior art keywords
tts system
text
test result
voice
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110890585.9A
Other languages
Chinese (zh)
Other versions
CN113409826A (en
Inventor
高羽
袁云浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
Midea Group Shanghai Co Ltd
Original Assignee
Midea Group Co Ltd
Midea Group Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, Midea Group Shanghai Co Ltd filed Critical Midea Group Co Ltd
Priority to CN202110890585.9A priority Critical patent/CN113409826B/en
Publication of CN113409826A publication Critical patent/CN113409826A/en
Application granted granted Critical
Publication of CN113409826B publication Critical patent/CN113409826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for testing the performance of a TTS system, which are applied to the technical field of speech synthesis and are used for solving the problem of lower accuracy of the method for testing the performance of the TTS system in the prior art. The method comprises the following steps: acquiring a text prediction result and a voice prediction result of a TTS system on an input text; determining a text processing performance test result of the TTS system based on the text prediction result; determining a voice conversion performance test result of the TTS system based on the voice prediction result; and determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result. In this way, the performance of the TTS system is tested by adopting the objective indexes of text processing and voice conversion, so that not only can the omnibearing test of the performance of the TTS system be realized, but also the accuracy of the performance test of the TTS system can be improved.

Description

TTS system performance test method, device, equipment and medium
Technical Field
The present application relates to the field of speech synthesis technologies, and in particular, to a method, an apparatus, a device, and a medium for testing performance of a TTS system.
Background
Text-to-Speech (TTS) systems are systems that convert Text into synthesized Speech that is as close as possible to real human Speech according to pronunciation specifications of a particular language, and are widely used in scenes such as voice assistants, smart homes, and map navigation.
At present, average opinion score (Mean Opinion Score, MOS) is generally adopted to score the voice synthesized by the TTS system, and the performance of the TTS system is determined according to the voice scoring result.
Disclosure of Invention
The embodiment of the application provides a TTS system performance test method, device, equipment and medium, which are used for solving the problem of low accuracy of the TTS system performance test method in the prior art.
The technical scheme provided by the embodiment of the application is as follows:
in one aspect, an embodiment of the present application provides a method for testing performance of a TTS system, including:
acquiring a text prediction result and a voice prediction result of a TTS system on an input text;
Determining a text processing performance test result of the TTS system based on the text prediction result;
determining a voice conversion performance test result of the TTS system based on the voice prediction result;
and determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
In another aspect, an embodiment of the present application provides a TTS system performance test apparatus, including:
the prediction result acquisition unit is used for acquiring a text prediction result and a voice prediction result of the TTS system on the input text;
a text processing performance test unit for determining a text processing performance test result of the TTS system based on the text prediction result;
the voice conversion performance test unit is used for determining a voice conversion performance test result of the TTS system based on the voice prediction result;
and a comprehensive performance determining unit for determining a comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
In another aspect, an embodiment of the present application provides a TTS system performance test apparatus, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the TTS system performance testing method provided by the embodiment of the application.
On the other hand, the embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer instructions which are executed by the processor to realize the TTS system performance test method provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects:
in the embodiment of the application, objective indexes of text processing and voice conversion are adopted to replace subjective indexes to test the performance of the TTS system, on one hand, the time and labor for testing the performance of the TTS system can be saved, the unification of the performance evaluation standard of the TTS system can be realized, the influence of human factors on the performance evaluation of the TTS system is eliminated, the accuracy of the performance test of the TTS system is improved, on the other hand, the text processing and the voice conversion of the TTS system are respectively subjected to the performance test, the omnibearing test of the performance of the TTS system can be realized, and the fact that the factors influencing the performance of the TTS system are in text processing or voice conversion can be effectively positioned, so that the development team can be assisted to purposefully optimize the performance of the TTS system, and the positive significance is provided for improving the performance of the TTS system.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1a is a schematic flow chart of a TTS system performance test method for serially testing text processing performance and speech conversion performance according to an embodiment of the present application;
FIG. 1b is a schematic flow chart showing another overview of a TTS system performance testing method for serially testing text-processing performance and speech-converting performance according to an embodiment of the present application;
FIG. 2 is a schematic overview of a TTS system performance testing method for testing text processing performance and speech conversion performance in parallel according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a specific flow chart of a TTS system performance test method for performing parallel test on text processing performance and speech conversion performance in an embodiment of the present application;
FIG. 4 is a schematic functional structure diagram of a TTS system performance testing apparatus according to an embodiment of the present application;
fig. 5 is a schematic hardware structure diagram of a TTS system performance test device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to facilitate a better understanding of the present application, technical terms related to the present application will be briefly described below.
1. The TTS system is a system for converting input text into synthesized voice which is close to the true tone of a speaker, and in the embodiment of the application, the TTS system can be a client such as application software or light application such as applet.
2. The text prediction result is a result of text processing on the input text by the TTS system, and in the embodiment of the present application, the text prediction result includes, but is not limited to: TTS systems predict phonemes, predicted numbers, predicted symbols, predicted prosody, and predicted accent positions for input text.
3. The text labeling result is a result of labeling the input text based on the real phonemes, the real numbers, the real symbols, the real prosody and the real accent positions of the input text, and in the embodiment of the application, the voice labeling result includes but is not limited to: labeling phonemes, labeling numbers, labeling symbols, labeling prosody and labeling accent positions.
4. The speech prediction result is a result of speech synthesis performed on the input text by the TTS system, and in the embodiment of the present application, the speech prediction result includes, but is not limited to: TTS systems predict speech for input text.
5. The voice labeling result is a result of labeling the input text based on the real voice corresponding to the input text, and in the embodiment of the application, the voice labeling result includes but is not limited to: and inputting the labeling voice of the text.
After technical terms related to the application are introduced, application scenes and design ideas of the embodiment of the application are briefly introduced.
In order to solve the problems of strong subjectivity and low accuracy of the performance test method of the TTS system based on MOS, in the embodiment of the application, the performance test equipment of the TTS system such as a mobile phone, a tablet personal computer and a computer can acquire a text prediction result and a voice prediction result of the TTS system on an input text in the process of processing the input text by the TTS system, determine the text processing performance test result of the TTS system based on the text prediction result, determine the voice conversion performance test result of the TTS system based on the voice prediction result, and determine the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result. In this way, objective indexes of text processing and voice conversion are adopted to replace subjective indexes to test the performance of the TTS system, on one hand, time and labor for testing the performance of the TTS system can be saved, unification of the performance evaluation standard of the TTS system can be realized, influence of human factors on the performance evaluation of the TTS system is eliminated, accuracy of the performance test of the TTS system is improved, on the other hand, the text processing and the voice conversion of the TTS system are respectively subjected to performance test, not only can omnibearing test on the performance of the TTS system be realized, but also factors affecting the performance of the TTS system can be effectively positioned as being in text processing or voice conversion, thereby assisting a development team to purposefully optimize the performance of the TTS system, and having positive significance for improving the performance of the TTS system.
After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described in detail below.
In the embodiment of the present application, the TTS system performance test device may perform a serial test on the text processing performance and the speech conversion performance of the TTS system, and determine the comprehensive performance test result of the TTS system according to the text processing performance test result and the speech conversion performance test result of the TTS system obtained in the serial test process, specifically, in one embodiment, referring to fig. 1a, the TTS system performance test device may first perform step 101: acquiring a text prediction result of a TTS system on an input text, and determining a text processing performance test result of the TTS system based on the text prediction result; step 102 is executed again: acquiring a voice prediction result of the TTS system on an input text, and determining a voice conversion performance test result of the TTS system based on the voice prediction result; finally, step 103 is executed: and determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result. In another embodiment, referring to fig. 1b, the TTS system performance test apparatus may first perform step 111: acquiring a voice prediction result of the TTS system on an input text, and determining a voice conversion performance test result of the TTS system based on the voice prediction result; step 112 is performed: acquiring a text prediction result of a TTS system on an input text, and determining a text processing performance test result of the TTS system based on the text prediction result; finally, step 113 is executed: and determining the comprehensive performance test result of the TTS system based on the voice conversion performance test result and the text processing performance test result.
In order to improve the efficiency of the performance test of the TTS system, the performance test device of the TTS system may further perform parallel test on the text processing performance and the speech conversion performance of the TTS system, and determine the comprehensive performance test result of the TTS system according to the text processing performance test result and the speech conversion performance test result of the TTS system obtained in the parallel test process, specifically, referring to fig. 2, the performance test device of the TTS system may perform step 201: acquiring a voice prediction result of the TTS system on the input text, and determining a voice conversion performance test result of the TTS system based on the voice prediction result, and executing step 202: acquiring a text prediction result of the TTS system on the input text, determining a text processing performance test result of the TTS system based on the text prediction result, and then executing step 203: and determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
The following describes in detail the steps in the TTS system performance test method shown in fig. 1a, fig. 1b and fig. 2 provided in the embodiment of the present application.
In the embodiment of the application, when the TTS system performance test equipment obtains the text prediction result and the voice prediction result of the TTS system on the input text, different obtaining modes can be adopted according to different installation positions of the TTS system, and specifically, the following two situations can exist, but are not limited to:
First case: the TTS system is installed on TTS system performance testing equipment.
In this case, the TTS system performance test device may call the TTS system to process the input text, and obtain a text prediction result and a speech prediction result of the TTS system on the input text in the process of calling the TTS system to process the input text, and specifically, the TTS system performance test device may obtain a predicted phoneme, a predicted number, a predicted symbol, a predicted prosody and a predicted accent position of the TTS system on the input text as the text prediction result, and obtain a predicted speech of the TTS system on the input text as the speech prediction result.
Second case: the TTS system is installed on other mobile phones, tablet computers, computers and other terminal equipment.
Under the condition, the TTS system performance test equipment can send a TTS system performance test instruction to the terminal equipment, and when the terminal equipment receives the TTS system performance test instruction sent by the TTS system performance test equipment, the TTS system is called to process an input text indicated by the TTS system performance test instruction, and the TTS system performance test equipment obtains a text prediction result and a voice prediction result of the TTS system on the input text in the process that the terminal equipment calls the TTS system to process the input text. Alternatively, in an embodiment, in a process of calling the TTS system to process the input text, the terminal device may send a text prediction result and a speech prediction result of the TTS system on the input text to the TTS system performance test device, so that the TTS system performance test device may obtain the text prediction result and the speech prediction result of the TTS system on the input text, specifically, the terminal device may send a predicted phoneme, a predicted number, a predicted sign, a predicted prosody and a predicted accent position of the TTS system on the input text to the TTS system performance test device as the text prediction result, and send a predicted speech of the TTS system on the input text to the TTS system performance test device as the speech prediction result, so that the TTS system performance test device may obtain the predicted phoneme, the predicted number, the predicted sign, the predicted prosody and the predicted accent position of the TTS system on the input text as the text prediction result, and obtain the predicted speech of the TTS system on the input text as the speech prediction result; in another embodiment, the TTS system performance test device may also actively acquire a text prediction result and a speech prediction result of the TTS system on the input text from the terminal device in a process that the terminal device invokes the TTS system to process the input text, and specifically, the TTS system performance test device may actively acquire a predicted phoneme, a predicted number, a predicted symbol, a predicted prosody and a predicted accent position of the TTS system on the input text from the terminal device as the text prediction result, and actively acquire a predicted speech of the TTS system on the input text as the speech prediction result.
In the embodiment of the application, when the TTS system performance test device determines the text processing performance test result of the TTS system based on the text prediction result, the following manner may be adopted, but is not limited to:
first, the TTS system performance test device determines a text accuracy test result of the TTS system based on the text prediction result and the text labeling result of the input text.
Specifically, the TTS system performance test device may determine a phoneme prediction accuracy, a digital conversion accuracy, a symbol conversion accuracy, a prosody prediction accuracy, and an accent position prediction accuracy based on the predicted phoneme, the predicted number, the predicted symbol, the predicted prosody, and the predicted accent position included in the text prediction result, and the labeling phoneme, the labeling number, the labeling symbol, the labeling prosody, and the labeling accent position of the input text included in the text labeling result, and then, based on weights corresponding to the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy, and the accent position prediction accuracy, perform weighted summation on the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy, and the accent position prediction accuracy, to obtain a text accuracy test result, where a numerical range of the text accuracy test result is [0,1].
In practical application, the weights corresponding to the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy can be flexibly set according to different practical requirements. For example, in the TTS system for the speech learning function, the requirements for prosody and accent are not high, the weights of the prosody prediction accuracy and accent position prediction accuracy may be set to a small value (e.g., each set to 0.05), and the weights of the phoneme prediction accuracy, the digital conversion accuracy, and the symbol conversion accuracy may be set to a large value (e.g., each set to 0.3); for another example, in the TTS system with respect to the story telling function, the requirements for digital conversion and symbol conversion are not high, the weights of the digital conversion accuracy and the symbol conversion accuracy may be set to a small value (e.g., each set to 0.05), and the weights of the prosody prediction accuracy, the accent position prediction accuracy, and the phoneme prediction accuracy may be set to a large value (e.g., each set to 0.3).
Then, the TTS system performance test device determines a text processing performance test result of the TTS system based on the text accuracy test result.
Alternatively, in one embodiment, the TTS system performance test apparatus may directly determine the text accuracy test result to be the text processing performance test result of the TTS system.
In another embodiment, in order to further improve the comprehensiveness and accuracy of the performance test of the TTS system, the performance test device of the TTS system may further determine, based on the text processing duration of the TTS system on the input text, a text responsiveness test result of the TTS system, and then determine a text processing performance test result based on the text accuracy test result and the text responsiveness test result.
In practical application, the TTS system performance test device determines a text responsiveness test result of the TTS system based on a text processing time of the TTS system on an input text, and determines the text processing performance test result based on the text accuracy test result and the text responsiveness test result, where the following manner may be adopted but not limited to:
first, the TTS system performance test device determines an actual text processing real-time rate of the TTS system based on a ratio of a text processing time length to a total time length of a labeled voice corresponding to an input text.
In specific implementation, the TTS system performance test device may directly determine a ratio of a text processing duration to a total duration of the labeled speech corresponding to the input text as an actual text processing real-time rate of the TTS system.
Then, the TTS system performance test apparatus determines a text responsiveness test result based on the actual text processing real-time rate and the target text processing real-time rate of the input text.
In the specific implementation, if the TTS system performance test equipment determines that the actual text processing real-time rate is smaller than or equal to the target text processing real-time rate, determining that the text responsiveness test result is 1; and if the TTS system performance test equipment determines that the actual text processing real-time rate is larger than the target text processing real-time rate, determining a text responsiveness test result based on the inverse of the ratio of the actual text processing real-time rate to the target text processing real-time rate. Specifically, the TTS system performance test device may directly determine the inverse of the ratio of the actual text processing real-time rate to the target text processing real-time rate as a text responsiveness test result, where the numerical range of the text responsiveness test result is [0,1].
Finally, the TTS system performance test equipment performs weighted summation on the text accuracy test result and the text responsiveness test result based on the weights corresponding to the text accuracy test result and the text responsiveness test result, and obtains a text processing performance test result, wherein the numerical range of the text processing performance test result is [0,2]. In practical application, weights corresponding to the text accuracy test result and the text responsiveness test result can be flexibly set according to practical requirements. For example, for a TTS system with a voice learning function, the accuracy requirement is high, the weight of the text accuracy test result may be set to a larger value (e.g., set to 0.7), and the weight of the text responsiveness test result may be set to a smaller value (e.g., set to 0.3); for another example, for a TTS system with a map navigation function, the real-time requirement is high, the weight of the text responsiveness test result may be set to a larger value (e.g., set to 0.6), and the weight of the text accuracy test result may be set to a smaller value (e.g., set to 0.4).
In the embodiment of the present application, when the TTS system performance test device determines the voice conversion performance test result of the TTS system based on the voice prediction result, the following manner may be adopted, but is not limited to:
firstly, the TTS system performance test equipment determines a voice accuracy test result of the TTS system based on a voice prediction result and a voice labeling result of an input text.
In a specific implementation, the TTS system performance test device may determine, based on a predicted speech included in the speech prediction result and a labeling speech of an input text included in the speech labeling result, a pronunciation generation similarity, a mel spectrum similarity, a duration generation similarity, a fundamental frequency generation similarity, and an energy generation similarity of the TTS system, and then weight the pronunciation generation similarity, the mel spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity, and the energy generation similarity according to weights corresponding to the pronunciation generation similarity, the mel spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity, and the energy generation similarity, and then perform weighted summation on the pronunciation generation similarity, the mel spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity, and the energy generation similarity, so as to obtain a speech accuracy test result, where a numerical range of the speech accuracy test result is [0,1]. In practical application, weights corresponding to the pronunciation generation similarity, the mel spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity can be flexibly set according to practical requirements, and specifically, the following two modes can be adopted but are not limited to: in the method 1, setting is performed according to a test target in practical application, for example, a most of predicted voices output by a TTS system have a long duration problem, the weight of the long duration generation similarity may be set to a larger value (for example, set to 0.4), and the weights corresponding to the pronunciation generation similarity, the mel spectrum similarity, the fundamental frequency generation similarity and the energy generation similarity are set to smaller values (for example, all set to 0.15); mode 2, setting according to the consistency of the overall score of the TTS system and the subjective auditory sense test, for example, selecting a weight combination more consistent with the MOS subjective auditory sense test from a plurality of weight combinations. Of course, in the embodiment of the present application, weights corresponding to the pronunciation generation similarity, the mel spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity, and the energy generation similarity may be set to 0.2.
Then, the TTS system performance test device determines a speech conversion performance test result of the TTS system based on the speech accuracy test result.
Alternatively, in one embodiment, the TTS system performance test apparatus may directly determine the voice accuracy test result as the voice conversion performance test result of the TTS system.
In another embodiment, in order to further improve the comprehensiveness and accuracy of the performance test of the TTS system, the performance test device of the TTS system may further determine, based on the speech synthesis duration of the TTS system on the input text, a speech conversion performance test result based on the speech accuracy test result and the speech responsiveness test result after determining the speech responsiveness test result of the TTS system.
In practical application, the TTS system performance test device determines a voice response test result of the TTS system based on a voice synthesis duration of the TTS system on an input text, and determines a voice conversion performance test result based on a voice accuracy test result and a voice response test result, where the following manner may be adopted but not limited to:
first, the TTS system performance test device determines an actual speech synthesis real-time rate of the TTS system based on a ratio of a speech synthesis time length to a total time length of a labeled speech corresponding to an input text.
In specific implementation, the TTS system performance test device may directly determine a ratio of a speech synthesis duration to a total duration of the labeled speech corresponding to the input text as an actual speech synthesis real-time rate of the TTS system.
Then, the TTS system performance test apparatus determines a voice responsiveness test result based on the actual voice synthesis real-time rate and the target voice synthesis real-time rate of the input text.
In the specific implementation, if the TTS system performance test equipment determines that the actual speech synthesis real-time rate is smaller than or equal to the target speech synthesis real-time rate, determining that the speech responsiveness test result is 1; and if the TTS system performance test equipment determines that the actual speech synthesis real-time rate is larger than the target speech synthesis real-time rate, determining a speech responsiveness test result based on the inverse of the ratio of the actual speech synthesis real-time rate to the target speech synthesis real-time rate. Specifically, the TTS system performance test apparatus may determine the inverse of the ratio of the actual speech synthesis real-time rate to the target speech synthesis real-time rate as a speech responsiveness test result, wherein the numerical range of the speech responsiveness test result is [0,1].
Finally, the TTS system performance test equipment performs weighted summation on the voice accuracy test result and the voice response test result based on the weights corresponding to the voice accuracy test result and the voice response test result, so as to obtain a voice conversion performance test result, wherein the numerical range of the voice conversion performance test result is [0,2]. In practical application, the weights corresponding to the voice accuracy test result and the voice responsiveness test result can be flexibly set according to practical requirements. For example, for a TTS system with a voice learning function, the accuracy requirement is high, the weight of the voice accuracy test result may be set to a larger value (e.g., 0.7), and the weight of the voice responsiveness test result may be set to a smaller value (e.g., 0.3); for another example, for a TTS system with a map navigation function, the real-time requirement is high, the weight of the voice response test result may be set to a larger value (e.g., 0.6), and the weight of the voice accuracy test result may be set to a smaller value (e.g., 0.4).
In the embodiment of the application, the TTS system performance test device determines the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result, and may adopt, but is not limited to, the following modes: the TTS system performance test equipment performs weighted summation on the text processing performance test result and the voice conversion performance test result based on the weights corresponding to the text processing performance test result and the voice conversion performance test result, and obtains the comprehensive performance test result of the TTS system. In practical application, the weights corresponding to the text processing performance test result and the voice conversion performance test result can be flexibly set according to practical requirements, and preferably, the weights corresponding to the text processing performance test result and the voice conversion performance test result can be set to 0.5.
Taking a "parallel test of text processing performance and speech conversion performance of a TTS system by TTS system performance test equipment" as an example, a method for testing TTS system performance provided by an embodiment of the present application is described below, and referring to fig. 3, a specific flow of the method for testing TTS system performance provided by the embodiment of the present application is as follows:
step 301: the TTS system performance test device invokes the TTS system to process the input text.
Step 302: the TTS system performance test equipment obtains predicted phonemes, predicted numbers, predicted symbols, predicted prosody and predicted accent positions of the input text as text prediction results in the process of processing the input text by the TTS system.
Step 303: the TTS system performance test equipment calculates a phoneme prediction accuracy rate, a digital conversion accuracy rate, a symbol conversion accuracy rate, a prosody prediction accuracy rate and an accent position prediction accuracy rate based on a predicted phoneme, a predicted number, a predicted symbol, a predicted prosody and a predicted accent position contained in the text prediction result and a marked phoneme, a marked number, a marked symbol, a marked prosody and a marked accent position of the input text contained in the text marking result.
Step 304: the TTS system performance test equipment performs weighted summation on the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy based on weights corresponding to the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy respectively, so as to obtain a text accuracy test result.
Step 305: the TTS system performance test equipment determines the ratio of the text processing time length of the TTS system to the input text to the total time length of the marked voice corresponding to the input text as the actual text processing real-time rate of the TTS system.
Step 306: the TTS system performance test device determines whether the actual text processing real-time rate is less than or equal to the target text processing real-time rate, if so, executes step 307, otherwise, executes step 308.
Step 307: the TTS system performance test device determines that the text responsiveness test result is 1.
Step 308: the TTS system performance test equipment determines the inverse of the ratio of the actual text processing real-time rate to the target text processing real-time rate as a text responsiveness test result.
Step 309: and the TTS system performance test equipment performs weighted summation on the text accuracy test result and the text response test result based on the weights corresponding to the text accuracy test result and the text response test result, so as to obtain a text processing performance test result.
Step 310: the TTS system performance test equipment obtains predicted voice of the TTS system to the input text as a voice prediction result in the process of processing the input text by the TTS system.
Step 311: the TTS system performance test equipment calculates pronunciation generation similarity, mel frequency spectrum similarity, duration generation similarity, fundamental frequency generation similarity and energy generation similarity of the TTS system based on the predicted voice contained in the voice prediction result and the labeling voice of the input text contained in the voice labeling result.
Step 312: the TTS system performance test equipment performs weighted summation on the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity based on weights corresponding to the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity, so as to obtain a voice accuracy test result.
Step 313: the TTS system performance test equipment determines the ratio of the voice synthesis time length of the TTS system to the input text to the total time length of the marked voice corresponding to the input text as the actual voice synthesis real-time rate of the TTS system.
Step 314: the TTS system performance test device determines whether the actual speech synthesis real-time rate is less than or equal to the target speech synthesis real-time rate, if so, executes step 315, otherwise, executes step 316.
Step 315: the TTS system performance test device determines that the voice responsiveness test result is 1.
Step 316: the TTS system performance test equipment determines the reciprocal of the ratio of the actual speech synthesis real-time rate to the target speech synthesis real-time rate as a speech responsiveness test result.
Step 317: the TTS system performance test equipment performs weighted summation on the voice accuracy test result and the voice response test result based on the weights corresponding to the voice accuracy test result and the voice response test result, and obtains a voice conversion performance test result.
Step 318: the TTS system performance test equipment performs weighted summation on the text processing performance test result and the voice conversion performance test result based on the weights corresponding to the text processing performance test result and the voice conversion performance test result, and obtains the comprehensive performance test result of the TTS system.
In practical application, the TTS system performance test device may execute steps 302 to 309 and steps 310 to 318 in parallel, so as to implement parallel test on text processing performance and speech conversion performance of the TTS system, and further greatly improve efficiency of the TTS system performance test.
Based on the above embodiments, the present application provides a TTS system performance testing apparatus, and referring to fig. 4, a TTS system performance testing apparatus 400 provided in the embodiment of the present application at least includes:
a predicted result obtaining unit 401, configured to obtain a text predicted result and a speech predicted result of the TTS system on the input text;
a text processing performance test unit 402 for determining a text processing performance test result of the TTS system based on the text prediction result;
a voice conversion performance test unit 403 for determining a voice conversion performance test result of the TTS system based on the voice prediction result;
The comprehensive performance determining unit 404 determines a comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
In one possible implementation manner, when obtaining a text prediction result and a speech prediction result of an input text by the TTS system, the prediction result obtaining unit 401 is specifically configured to:
the TTS system is used for obtaining predicted phonemes, predicted numbers, predicted symbols, predicted prosody and predicted accent positions of an input text as text prediction results;
and obtaining the predicted voice of the TTS system on the input text as a voice prediction result.
In one possible implementation, when determining the text processing performance test result of the TTS system based on the text prediction result, the text processing performance test unit 402 is specifically configured to:
determining a text accuracy test result of the TTS system based on the text prediction result and the text labeling result of the input text;
and determining a text processing performance test result of the TTS system based on the text accuracy test result.
In one possible implementation, when determining the text accuracy test result of the TTS system based on the text prediction result and the text labeling result of the input text, the text processing performance test unit 402 is specifically configured to:
Determining a phoneme prediction accuracy, a digital conversion accuracy, a symbol conversion accuracy, a prosody prediction accuracy and an accent position prediction accuracy based on a predicted phoneme, a predicted digit, a predicted symbol, a predicted prosody and a predicted accent position contained in the text prediction result and a labeling phoneme, a labeling digit, a labeling symbol, a labeling prosody and a labeling accent position of an input text contained in the text labeling result;
and weighting and summing the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy based on weights corresponding to the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy respectively to obtain a text accuracy test result.
In a possible implementation, the text processing performance test unit 402 is further configured to:
and determining a text responsiveness test result of the TTS system based on the text processing time of the TTS system on the input text.
In one possible implementation, when determining a text responsiveness test result of the TTS system based on a text processing duration of the TTS system for the input text, the text processing performance test unit 402 is specifically configured to:
Determining the actual text processing real-time rate of the TTS system based on the ratio of the text processing time length to the total time length of the marked voice corresponding to the input text;
a text responsiveness test result is determined based on the actual text processing real-time rate and the target text processing real-time rate of the input text.
In one possible implementation, the text processing performance test unit 402 is specifically configured to, when determining a text responsiveness test result of the TTS system based on the actual text processing real-time rate and the target text processing real-time rate of the input text:
if the real-time rate of the actual text processing is less than or equal to the real-time rate of the target text processing, determining that the text responsiveness test result is 1;
if the actual text processing real-time rate is determined to be larger than the target text processing real-time rate, determining a text responsiveness test result based on the inverse of the ratio of the actual text processing real-time rate to the target text processing real-time rate.
In one possible implementation, when determining the text processing performance test result of the TTS system based on the text accuracy test result, the text processing performance test unit 402 is specifically configured to:
and carrying out weighted summation on the text accuracy test result and the text responsiveness test result based on the weights corresponding to the text accuracy test result and the text responsiveness test result, so as to obtain a text processing performance test result.
In one possible implementation, when determining the speech conversion performance test result of the TTS system based on the speech prediction result, the speech conversion performance test unit 403 is specifically configured to:
determining a voice accuracy test result of the TTS system based on the voice prediction result and the voice labeling result of the input text;
and determining a voice conversion performance test result of the TTS system based on the voice accuracy test result.
In one possible implementation, when determining the voice accuracy test result of the TTS system based on the voice prediction result and the voice labeling result of the input text, the voice conversion performance test unit 403 is specifically configured to:
determining pronunciation generation similarity, mel frequency spectrum similarity, duration generation similarity, fundamental frequency generation similarity and energy generation similarity of a TTS (text to speech) system based on predicted voice contained in a voice prediction result and labeled voice of an input text contained in a voice labeling result;
and weighting and summing the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity based on weights corresponding to the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity to obtain a voice accuracy test result.
In a possible implementation, the voice conversion performance test unit 403 is further configured to:
and determining a voice response test result of the TTS system based on the voice synthesis time of the TTS system on the input text.
In one possible implementation, when determining the voice responsiveness test result of the TTS system based on the duration of the voice synthesis of the input text by the TTS system, the voice conversion performance test unit 403 is specifically configured to:
determining the actual speech synthesis real-time rate of the TTS system based on the ratio of the speech synthesis time length to the total time length of the marked speech corresponding to the input text;
and determining a voice response test result based on the actual voice synthesis real-time rate and the target voice synthesis real-time rate of the input text.
In one possible implementation, when determining the speech responsiveness test result based on the actual speech synthesis real-time rate and the target speech synthesis real-time rate of the input text, the speech conversion performance test unit 403 is specifically configured to:
if the actual speech synthesis real-time rate is less than or equal to the target speech synthesis real-time rate, determining that the speech responsiveness test result is 1;
if the actual speech synthesis real-time rate is determined to be larger than the target speech synthesis real-time rate, determining a speech responsiveness test result based on the inverse of the ratio of the actual speech synthesis real-time rate to the target speech synthesis real-time rate.
In one possible implementation, when determining the voice conversion performance test result of the TTS system based on the voice accuracy test result, the voice conversion performance test unit 403 is specifically configured to:
and carrying out weighted summation on the voice accuracy test result and the voice response test result based on the weights corresponding to the voice accuracy test result and the voice response test result, so as to obtain a voice conversion performance test result.
In one possible implementation, when determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the speech conversion performance test result, the comprehensive performance determining unit 404 is specifically configured to:
and carrying out weighted summation on the text processing performance test result and the voice conversion performance test result based on the weights corresponding to the text processing performance test result and the voice conversion performance test result, so as to obtain the comprehensive performance test result of the TTS system.
It should be noted that, the principle of solving the technical problem of the TTS system performance testing apparatus 400 provided by the embodiment of the present application is similar to that of the TTS system performance testing method provided by the embodiment of the present application, so that the implementation of the TTS system performance testing apparatus 400 provided by the embodiment of the present application can refer to the implementation of the TTS system performance testing method provided by the embodiment of the present application, and the repetition is omitted.
After the method and the device for testing the performance of the TTS system provided by the embodiment of the application are introduced, the TTS system performance testing equipment provided by the embodiment of the application is briefly introduced.
Referring to fig. 5, a TTS system performance test apparatus 500 according to an embodiment of the present application includes: processor 501, memory 502, and a computer program stored in memory 502 and executable on processor 501, processor 501 when executing the computer program implements the TTS system performance test method provided by the embodiments of the present application.
The TTS system performance test apparatus 500 provided by an embodiment of the present application may further include a bus 503 that connects the different components (including the processor 501 and the memory 502). Where bus 503 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.
The Memory 502 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 5021 and/or cache Memory 5022, and may further include Read Only Memory (ROM) 5023.
The memory 502 may also include a program tool 5025 having a set (at least one) of program modules 5024, the program modules 5024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
TTS system performance testing device 500 may also communicate with one or more external devices 504 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with TTS system performance testing device 500 (e.g., cell phone, computer, etc.), and/or any device that enables TTS system performance testing device 500 to communicate with one or more other TTS system performance testing devices 500 (e.g., router, modem, etc.). Such communication may be through an Input/Output (I/O) interface 505. Also, TTS system performance test apparatus 500 may also communicate with one or more networks (e.g., local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network, such as the internet) via network adapter 506. As shown in fig. 5, the network adapter 506 communicates with other modules of the TTS system performance test apparatus 500 via the bus 503. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in connection with TTS system performance testing apparatus 500, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.
It should be noted that the TTS system performance test apparatus 500 shown in fig. 5 is only an example, and should not be construed as limiting the function and scope of use of the embodiment of the present application.
The following describes a computer-readable storage medium provided by an embodiment of the present application. The computer readable storage medium provided by the embodiment of the application stores computer instructions, and the computer instructions realize the TTS system performance testing method provided by the embodiment of the application when being executed by a processor. Specifically, the executable program may be built in or installed in the TTS system performance testing apparatus 500, so that the TTS system performance testing apparatus 500 may implement the TTS system performance testing method provided by the embodiment of the present application by executing the built-in or executable program.
In addition, the TTS system performance testing method provided by the embodiment of the present application may also be implemented as a program product, where the program product includes program code for causing the TTS system performance testing apparatus 500 to execute the TTS system performance testing method provided by the embodiment of the present application when the program product is capable of being executed on the TTS system performance testing apparatus 500.
The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product provided by embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the spirit or scope of the embodiments of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is also intended to include such modifications and variations.

Claims (18)

1. A method for testing performance of a TTS system, comprising:
acquiring a text prediction result and a voice prediction result of a TTS system on an input text;
determining a text processing performance test result of the TTS system based on the text prediction result;
determining a voice conversion performance test result of the TTS system based on the voice prediction result;
and determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
2. The TTS system performance testing method of claim 1, wherein obtaining text prediction results and speech prediction results of the TTS system for the input text includes:
acquiring predicted phonemes, predicted numbers, predicted symbols, predicted prosody and predicted accent positions of the input text by the TTS system as the text prediction result;
and obtaining the predicted voice of the TTS system to the input text as the voice prediction result.
3. The TTS system performance testing method of claim 2, wherein determining text processing performance testing results of the TTS system based on the text prediction results includes:
determining a text accuracy test result of the TTS system based on the text prediction result and the text labeling result of the input text;
and determining a text processing performance test result of the TTS system based on the text accuracy test result.
4. The TTS system performance testing method of claim 3, wherein determining text accuracy testing results of the TTS system based on the text prediction results and text labeling results of the input text comprises:
Determining a phoneme prediction accuracy rate, a digital conversion accuracy rate, a symbol conversion accuracy rate, a prosody prediction accuracy rate and an accent position prediction accuracy rate based on a predicted phoneme, a predicted digit, a predicted symbol, a predicted prosody and a predicted accent position contained in the text prediction result and a marked phoneme, a marked digit, a marked symbol, a marked prosody and a marked accent position of the input text contained in the text marking result;
and weighting and summing the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy based on weights corresponding to the phoneme prediction accuracy, the digital conversion accuracy, the symbol conversion accuracy, the prosody prediction accuracy and the accent position prediction accuracy respectively to obtain the text accuracy test result.
5. The TTS system performance testing method of claim 3, further comprising:
and determining a text responsiveness test result of the TTS system based on the text processing time of the TTS system on the input text.
6. The TTS system performance testing method of claim 5, wherein determining text responsiveness test results for the TTS system based on a text processing time period of the TTS system for the input text includes:
Determining the actual text processing real-time rate of the TTS system based on the ratio of the text processing time length to the total time length of the marked voice corresponding to the input text;
and determining the text responsiveness test result based on the actual text processing real-time rate and the target text processing real-time rate of the input text.
7. The TTS system performance testing method of claim 6, wherein determining text responsiveness test results for the TTS system based on the actual text processing real-time rate and a target text processing real-time rate for the input text comprises:
if the actual text processing real-time rate is less than or equal to the target text processing real-time rate, determining that the text responsiveness test result is 1;
and if the actual text processing real-time rate is determined to be larger than the target text processing real-time rate, determining the text responsiveness test result based on the inverse of the ratio of the actual text processing real-time rate to the target text processing real-time rate.
8. The TTS system performance testing method of any one of claims 5-7, wherein determining text processing performance testing results of the TTS system based on the text accuracy testing results comprises:
And carrying out weighted summation on the text accuracy test result and the text responsiveness test result based on the weights corresponding to the text accuracy test result and the text responsiveness test result, so as to obtain the text processing performance test result.
9. The TTS system performance testing method of any one of claims 2-7, wherein determining a speech conversion performance testing result of the TTS system based on the speech prediction result includes:
determining a voice accuracy test result of the TTS system based on the voice prediction result and the voice labeling result of the input text;
and determining a voice conversion performance test result of the TTS system based on the voice accuracy test result.
10. The TTS system performance testing method of claim 9, wherein determining a voice accuracy testing result of the TTS system based on the voice prediction result and a voice labeling result of the input text includes:
determining pronunciation generation similarity, mel frequency spectrum similarity, duration generation similarity, fundamental frequency generation similarity and energy generation similarity of the TTS system based on the predicted voice contained in the voice prediction result and the labeling voice of the input text contained in the voice labeling result;
And weighting and summing the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity based on weights corresponding to the pronunciation generation similarity, the Mel frequency spectrum similarity, the duration generation similarity, the fundamental frequency generation similarity and the energy generation similarity to obtain the voice accuracy test result.
11. The TTS system performance testing method of claim 9, further comprising:
and determining a voice response test result of the TTS system based on the voice synthesis duration of the TTS system on the input text.
12. The TTS system performance testing method of claim 11, wherein determining a voice responsiveness test result of the TTS system based on a duration of voice synthesis of the input text by the TTS system includes:
determining the actual speech synthesis real-time rate of the TTS system based on the ratio of the speech synthesis time length to the total time length of the marked speech corresponding to the input text;
and determining the voice response test result based on the actual voice synthesis real-time rate and the target voice synthesis real-time rate of the input text.
13. The TTS system performance test method of claim 12, wherein determining the speech responsiveness test result based on the actual speech synthesis real-time rate and a target speech synthesis real-time rate of the input text includes:
if the actual speech synthesis real-time rate is less than or equal to the target speech synthesis real-time rate, determining that the speech responsiveness test result is 1;
and if the actual speech synthesis real-time rate is determined to be larger than the target speech synthesis real-time rate, determining the speech responsiveness test result based on the inverse of the ratio of the actual speech synthesis real-time rate to the target speech synthesis real-time rate.
14. The TTS system performance testing method of claim 11, wherein determining a voice conversion performance test result for the TTS system based on the voice accuracy test result includes:
and carrying out weighted summation on the voice accuracy test result and the voice response test result based on the weights corresponding to the voice accuracy test result and the voice response test result, so as to obtain the voice conversion performance test result.
15. The TTS system performance testing method of any one of claims 1-7, wherein determining a comprehensive performance test result of the TTS system based on the text processing performance test result and the speech conversion performance test result includes:
And carrying out weighted summation on the text processing performance test result and the voice conversion performance test result based on the weights corresponding to the text processing performance test result and the voice conversion performance test result, so as to obtain the comprehensive performance test result of the TTS system.
16. A TTS system performance testing apparatus, comprising:
the prediction result acquisition unit is used for acquiring a text prediction result and a voice prediction result of the TTS system on the input text;
a text processing performance test unit for determining a text processing performance test result of the TTS system based on the text prediction result;
a voice conversion performance test unit for determining a voice conversion performance test result of the TTS system based on the voice prediction result;
and the comprehensive performance determining unit is used for determining the comprehensive performance test result of the TTS system based on the text processing performance test result and the voice conversion performance test result.
17. A TTS system performance testing apparatus, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the TTS system performance testing method of any one of claims 1-15 when the computer program is executed.
18. A computer readable storage medium storing computer instructions which when executed by a processor implement the TTS system performance test method of any one of claims 1-15.
CN202110890585.9A 2021-08-04 2021-08-04 TTS system performance test method, device, equipment and medium Active CN113409826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110890585.9A CN113409826B (en) 2021-08-04 2021-08-04 TTS system performance test method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110890585.9A CN113409826B (en) 2021-08-04 2021-08-04 TTS system performance test method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113409826A CN113409826A (en) 2021-09-17
CN113409826B true CN113409826B (en) 2023-09-19

Family

ID=77688407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110890585.9A Active CN113409826B (en) 2021-08-04 2021-08-04 TTS system performance test method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113409826B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050396A (en) * 2022-06-15 2022-09-13 北京百度网讯科技有限公司 Test method and device, electronic device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0504546D0 (en) * 2005-03-04 2005-04-13 Toshiba Res Europ Ltd Method and apparatus for assessing text-to-speech synthesis systems
CN105593936A (en) * 2013-10-24 2016-05-18 宝马股份公司 System and method for text-to-speech performance evaluation
CN109147761A (en) * 2018-08-09 2019-01-04 北京易诚高科科技发展有限公司 Test method based on batch speech recognition and TTS text synthesis
CN111681642A (en) * 2020-06-03 2020-09-18 北京字节跳动网络技术有限公司 Speech recognition evaluation method, device, storage medium and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7684988B2 (en) * 2004-10-15 2010-03-23 Microsoft Corporation Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0504546D0 (en) * 2005-03-04 2005-04-13 Toshiba Res Europ Ltd Method and apparatus for assessing text-to-speech synthesis systems
CN105593936A (en) * 2013-10-24 2016-05-18 宝马股份公司 System and method for text-to-speech performance evaluation
CN109147761A (en) * 2018-08-09 2019-01-04 北京易诚高科科技发展有限公司 Test method based on batch speech recognition and TTS text synthesis
CN111681642A (en) * 2020-06-03 2020-09-18 北京字节跳动网络技术有限公司 Speech recognition evaluation method, device, storage medium and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale;Mahesh Viswanathan 等;《Computer Speech and Language》;第55-83页 *
智能显示器语音识别软件的自动化测试环境;钟理 等;《控制与信息技术》(第4期);第35-38,43页 *
汉语普通话语音合成语料库TH-CoSS的建设和分析;蔡莲红 等;《中文信息学报》;第21卷(第2期);第94-99页 *

Also Published As

Publication number Publication date
CN113409826A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN109754778B (en) Text speech synthesis method and device and computer equipment
EP3373293B1 (en) Speech recognition method and apparatus
US10810993B2 (en) Sample-efficient adaptive text-to-speech
CN106935239A (en) The construction method and device of a kind of pronunciation dictionary
CN113345415B (en) Speech synthesis method, device, equipment and storage medium
CN113488024B (en) Telephone interrupt recognition method and system based on semantic recognition
CN110992929A (en) Voice keyword detection method, device and system based on neural network
CN114330371A (en) Session intention identification method and device based on prompt learning and electronic equipment
CN112634866A (en) Speech synthesis model training and speech synthesis method, apparatus, device and medium
CN113409826B (en) TTS system performance test method, device, equipment and medium
CN109597881B (en) Matching degree determination method, device, equipment and medium
CN117493830A (en) Evaluation of training data quality, and generation method, device and equipment of evaluation model
JP2007206501A (en) Device for determining optimum speech recognition system, speech recognition device, parameter calculation device, information terminal device and computer program
CN114495977A (en) Speech translation and model training method, device, electronic equipment and storage medium
CN114218356B (en) Semantic recognition method, device, equipment and storage medium based on artificial intelligence
CN116580698A (en) Speech synthesis method, device, computer equipment and medium based on artificial intelligence
CN110910905B (en) Mute point detection method and device, storage medium and electronic equipment
CN114023309A (en) Speech recognition system, related method, device and equipment
CN117496945A (en) Training method of speech synthesis model, speech processing method and device
CN115831094A (en) Multilingual voice recognition method, system, storage medium and electronic equipment
CN111506701B (en) Intelligent query method and related device
JP2022133447A (en) Speech processing method and device, electronic apparatus, and storage medium
CN114613351A (en) Rhythm prediction method, device, readable medium and electronic equipment
CN109285559B (en) Role transition point detection method and device, storage medium and electronic equipment
CN112711654A (en) Chinese character interpretation phonetics generation method, system, equipment and medium for voice robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant