Article

Prosodic Events Recognition in Evaluation of Speech-Synthesis System Performance

Authors:

France Mihelič,

Boštjan Vesnicer,

Elmar NöthAuthors Info & Claims

TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue

Pages 419 - 426

https://doi.org/10.1007/978-3-540-87391-4_54

Published: 08 September 2008 Publication History

Abstract

We present an objective-evaluation method of the prosody modeling in an HMM-based Slovene speech-synthesis system. Method is based on the results of the automatic recognition of syntactic-prosodic boundary positions and accented words in the synthetic speech. We have shown that the recognition results represent a close match with the prosodic notations, labeled by the human expert on the natural-speech counterpart that was used to train the speech-synthesis system. The recognition rate of the prosodic events is proposed as an objective evaluation measure for the quality of the prosodic modeling in the speech-synthesis system. The results of the proposed evaluation method are also in accordance with previous subjective-listening assesment evaluations, where high scores for the naturalness for such a type of speech synthesis were observed.

References

[1]

Batliner, A., Kompe, R., Kießling, A., Mast, M., Niemann, H., Nöth, E.: M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases. Speech Communication 25, 193-222 (1998).

[2]

Buckow, J.: Multilingual Prosody in Automatic Speech Understanding. Logos Verlag Berlin (2004).

[3]

Campbell, N., Black, A.: Prosody and the Selection of Source Units for Concatenative Synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 279-282. Springer, Heidelberg (1996).

[4]

Gros, J.: A two-level duration model for the Slovenian speech. Electrotechnical Review 66(2), 92-97 (1999).

[5]

Mihelič, A., Gros, Ž., Pavešic, N., Žganec, M.: Efficient subset selection from phonetically transcribed text corpora for concatenation-based embedded text-to-speech synthesis. Informacije MIDEM 36(1), 19-24 (2006).

[6]

Mihelič, F., Gros, J., Nöth, E., ibert, J., Pavešic, N.: Spoken Language Resources at LUKS of the University of Ljubljana. Journal of Speech Technology 6, 221-232 (2003).

[7]

Mihelič, F., Gros, J., Dobrišek, S., Žibert, J., Pavešic, N.: Spoken Language Resources at LUKS of the University of Ljubljana. International Journal of Speech Technology 6, 221- 232 (2003).

[8]

Ostendorf, M., Bulyko, I.: The Impact of Speech Recognition on Speech Synthesis. In: Proc. of the IEEEWorkshop on Speech Synthesis (2002).

[9]

Rabiner, L., Huang, B.-H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993).

[10]

Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from HMM using dynamic features. In: Proc. of ICASSP, vol. 1, pp. 660-663 (1995).

[11]

Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-based Speech Synthesis. In: Proc. ICASSP, vol. 3, pp. 1315- 1318 (2000).

[12]

Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-Space Probability Distribution HMM. IEICE Transactions on Information and Systems E85-D(3), 455-464 (2002).

[13]

Vesnicer, B., Mihelič, F.: Evaluation of Slovenian HMM-Based Speech Synthesis System. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206. Springer, Heidelberg (2004).

[14]

Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration Modeling for HMM-based Speech Synthesis. In: Proc. ICSLP, vol. 2, pp. 29-32 (1998).

[15]

Zemljak, M., Kačič, Z., Dobrišek, S., Gros, J., Weiss, P.: Computer-based Symbols for Slovene Speech. Journal for Linguistics and Literary Studies 2, 159-294 (2002).

[16]

Žibert, J., Mihelič, F.: Development of Slovenian broadcast news speech database. In: Proceedings of Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 2095-2098 (2004).

[17]

Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, www.csie.ntu.edu.tw/~cjlin/libsvm.

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Prosody modification for speech recognition in emotionally mismatched conditions

A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of ...
Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction

Speech intelligibility is the most important parameter in evaluation of speech quality. In the contribution, a new objective intelligibility assessment of general speech processing algorithms is proposed. It is based on automatic recognition methods ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue

September 2008

641 pages

ISBN:9783540873907

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents