Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-70872-8_17guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Expressive Speech Synthesis Using Emotion-Specific Speech Inventories

Published: 17 December 2008 Publication History

Abstract

In this paper we explore the use of emotion-specific speech inventories for expressive speech synthesis. We recorded a semantically neutral sentence and 26 logatoms containing all the diphones and CVC triphones necessary to synthesize the same sentence. The speech material was produced by a professional actress expressing all logatoms and the sentence with the six basic emotions and in neutral tone. 7 emotion-dependent inventories were constructed from the logatoms. The 7 inventories paired with the prosody extracted from the 7 natural sentences were used to synthesize 49 sentences. 194 listeners evaluated the emotions expressed in the logatoms and in the natural and synthetic sentences. The intended emotion was recognized above chance level for 99% of the logatoms and for all natural sentences. Recognition rates significantly above chance level were obtained for each emotion. The recognition rate for some synthetic sentences exceeded that of natural ones.

References

[1]
Ladd, D.R., Silverman, K., Tolkmitt, F., Bergmann, G., Scherer, K.R.: Evidence for the independent function of intonation contour type, voice quality, and f0 range in signalling speaker affect. Journal of the Acoustic Society of America 78(2), 435-444 (1985).
[2]
Inanoglu, Z., Young, S.: A system for Transforming the Emotion in Speech: Combining Data-Driven Conversion Techniques for Prosody and Voice Quality. In: Interspeech (2007).
[3]
Montero, J.M., Arriola, G.J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modeling of Emotional Speech in Spanish. In: Proc. of ICPhS, pp. 957-960 (1999).
[4]
Bulut, M., Narayanan, S.S., Syrdal, A.K.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: ICSLP-2002, pp. 1265-1268 (2002).
[5]
Schröder, M., Grice, M.: Expressing Vocal Effort in Concatenative Synthesis. In: Proc. of ICPhS, Barcelona, Spain, pp. 2589-2592 (2003).
[6]
Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341-345 (2001).

Cited By

View all
  • (2009)Harmonic model for female voice emotional synthesisProceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication10.5555/1812740.1812748(41-48)Online publication date: 16-Sep-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide books
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction: COST Action 2102 International Conference, Patras, Greece, October 29-31, 2007. Revised Papers
December 2008
279 pages
ISBN:9783540708711
  • Editors:
  • Anna Esposito,
  • Nikolaos G. Bourbakis,
  • Nikolaos Avouris,
  • Ioannis Hatzilygeroudis

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 December 2008

Author Tags

  1. Expressive speech synthesis
  2. basic emotions
  3. diphone and triphone inventory
  4. forced choice
  5. listening test

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Harmonic model for female voice emotional synthesisProceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication10.5555/1812740.1812748(41-48)Online publication date: 16-Sep-2009

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media