Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-87391-4_54guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Prosodic Events Recognition in Evaluation of Speech-Synthesis System Performance

Published: 08 September 2008 Publication History

Abstract

We present an objective-evaluation method of the prosody modeling in an HMM-based Slovene speech-synthesis system. Method is based on the results of the automatic recognition of syntactic-prosodic boundary positions and accented words in the synthetic speech. We have shown that the recognition results represent a close match with the prosodic notations, labeled by the human expert on the natural-speech counterpart that was used to train the speech-synthesis system. The recognition rate of the prosodic events is proposed as an objective evaluation measure for the quality of the prosodic modeling in the speech-synthesis system. The results of the proposed evaluation method are also in accordance with previous subjective-listening assesment evaluations, where high scores for the naturalness for such a type of speech synthesis were observed.

References

[1]
Batliner, A., Kompe, R., Kießling, A., Mast, M., Niemann, H., Nöth, E.: M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases. Speech Communication 25, 193-222 (1998).
[2]
Buckow, J.: Multilingual Prosody in Automatic Speech Understanding. Logos Verlag Berlin (2004).
[3]
Campbell, N., Black, A.: Prosody and the Selection of Source Units for Concatenative Synthesis. In: van Santen, J., Sproat, R., Olive, J., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 279-282. Springer, Heidelberg (1996).
[4]
Gros, J.: A two-level duration model for the Slovenian speech. Electrotechnical Review 66(2), 92-97 (1999).
[5]
Mihelič, A., Gros, Ž., Pavešic, N., Žganec, M.: Efficient subset selection from phonetically transcribed text corpora for concatenation-based embedded text-to-speech synthesis. Informacije MIDEM 36(1), 19-24 (2006).
[6]
Mihelič, F., Gros, J., Nöth, E., ibert, J., Pavešic, N.: Spoken Language Resources at LUKS of the University of Ljubljana. Journal of Speech Technology 6, 221-232 (2003).
[7]
Mihelič, F., Gros, J., Dobrišek, S., Žibert, J., Pavešic, N.: Spoken Language Resources at LUKS of the University of Ljubljana. International Journal of Speech Technology 6, 221- 232 (2003).
[8]
Ostendorf, M., Bulyko, I.: The Impact of Speech Recognition on Speech Synthesis. In: Proc. of the IEEEWorkshop on Speech Synthesis (2002).
[9]
Rabiner, L., Huang, B.-H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993).
[10]
Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from HMM using dynamic features. In: Proc. of ICASSP, vol. 1, pp. 660-663 (1995).
[11]
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-based Speech Synthesis. In: Proc. ICASSP, vol. 3, pp. 1315- 1318 (2000).
[12]
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-Space Probability Distribution HMM. IEICE Transactions on Information and Systems E85-D(3), 455-464 (2002).
[13]
Vesnicer, B., Mihelič, F.: Evaluation of Slovenian HMM-Based Speech Synthesis System. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206. Springer, Heidelberg (2004).
[14]
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration Modeling for HMM-based Speech Synthesis. In: Proc. ICSLP, vol. 2, pp. 29-32 (1998).
[15]
Zemljak, M., Kačič, Z., Dobrišek, S., Gros, J., Weiss, P.: Computer-based Symbols for Slovene Speech. Journal for Linguistics and Literary Studies 2, 159-294 (2002).
[16]
Žibert, J., Mihelič, F.: Development of Slovenian broadcast news speech database. In: Proceedings of Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 2095-2098 (2004).
[17]
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines, www.csie.ntu.edu.tw/~cjlin/libsvm.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue
September 2008
641 pages
ISBN:9783540873907

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2008

Author Tags

  1. Speech synthesis
  2. prosody
  3. system evaluation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media