Nothing Special   »   [go: up one dir, main page]

Skip to main content

Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech

  • Conference paper
Artificial Intelligence: Theories, Models and Applications (SETN 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6040))

Included in the following conference series:

  • 2144 Accesses

Abstract

In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniques, we investigate the practical usefulness of two feature selection techniques, namely the Relief and the Correlation-based Feature Selection (CFS) algorithms, for improving the accuracy of phone duration modeling. The feature selection is performed over a large set of phonetic, morphologic and syntactic features. In the experiments, we employed phone duration models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, trained on a Modern Greek speech database of emotional speech, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration modeling regardless of the type of machine learning algorithm used for phone duration modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers, Dodrecht (1997)

    Google Scholar 

  2. Klatt, D.H.: Synthesis by rule of segmental durations in English sentences. In: Lindlom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 287–300. Academic Press, New York (1979)

    Google Scholar 

  3. Möbius, B., Santen, P.H.J.: Modeling Segmental duration in German Text-to-Speech Synthesis. In: 4th International Conference on Spoken Language Processing (ICSLP), pp. 2395–2398 (1996)

    Google Scholar 

  4. Takeda, K., Sagisaka, Y., Kuwabara, H.: On sentence-level factors governing segmental duration in Japanese. Journal of Acoustic Society of America 6(86), 2081–2087 (1989)

    Article  Google Scholar 

  5. Santen, J.P.H.: Contextual effects on vowel durations. Speech Communication 11, 513–546 (1992)

    Article  Google Scholar 

  6. Campbell, W.N.: Syllable based segment duration. In: Bailly, G., Benoit, C., Sawallis, T.R. (eds.) Talking Machines: Theories, Models and Designs, pp. 211–224. Elsevier, Amsterdam (1992)

    Google Scholar 

  7. Goubanova, O., King, S.: Bayesian network for phone duration prediction. Speech Communication 50, 301–311 (2008)

    Article  Google Scholar 

  8. Lazaridis, A., Zervas, P., Kokkinakis, G.: Segmental Duration Modeling for Greek Speech Synthesis. In: 19th IEEE International Conference of Tools with Artificial Intelligence (ICTAI), pp. 518–521 (2007)

    Google Scholar 

  9. Jiang, D.N., Zhang, W., Shen, L., Cai, L.H.: Prosody Analysis and Modeling for Emotional Speech Synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), pp. 281–284 (2005)

    Google Scholar 

  10. Inanoglu, Z., Young, S.: Data-driven emotion conversion in spoken English. Speech Communication 51, 268–283 (2009)

    Article  Google Scholar 

  11. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: 9th International Conference on Machine Learning (ICML), pp. 249–256 (1992)

    Google Scholar 

  12. Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishing, San Francisco (2005)

    MATH  Google Scholar 

  13. Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: 9th European Conf. on Machine Learning, University of Economics, Faculty of Informatics and Statistics, pp. 128–137 (1997)

    Google Scholar 

  14. Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992)

    Google Scholar 

  15. Kääriäinen, M., Malinen, T.: Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees. Journal of Machine Learning Research 5, 1107–1126 (2004)

    Google Scholar 

  16. Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Journal of Machine Learning 6, 37–66 (1991)

    Google Scholar 

  17. Atkeson, C.G., Moorey, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1996)

    Article  Google Scholar 

  18. Friedman, J.H.: Stochastic gradient boosting. Comput. Statist. Data Anal. 4(38), 367–378 (2002)

    Article  Google Scholar 

  19. Breiman, L.: Bagging Predictors. Journal of Machine Learning 2(24), 123–140 (1996)

    Google Scholar 

  20. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, New Zealand (1999)

    Google Scholar 

  21. Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  22. Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 641–644 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lazaridis, A., Ganchev, T., Mporas, I., Kostoulas, T., Fakotakis, N. (2010). Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12842-4_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12841-7

  • Online ISBN: 978-3-642-12842-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics