Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech

Alexandros Lazaridis²¹,
Todor Ganchev²¹,
Iosif Mporas²¹,
Theodoros Kostoulas²¹ &
…
Nikos Fakotakis²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6040))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

2164 Accesses

Abstract

In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniques, we investigate the practical usefulness of two feature selection techniques, namely the Relief and the Correlation-based Feature Selection (CFS) algorithms, for improving the accuracy of phone duration modeling. The feature selection is performed over a large set of phonetic, morphologic and syntactic features. In the experiments, we employed phone duration models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, trained on a Modern Greek speech database of emotional speech, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration modeling regardless of the type of machine learning algorithm used for phone duration modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

DNN-Based Duration Modeling for Synthesizing Short Sentences

Soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition

Article 26 December 2016

Shennong: A Python toolbox for audio speech features extraction

Article 07 February 2023

References

Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers, Dodrecht (1997)
Google Scholar
Klatt, D.H.: Synthesis by rule of segmental durations in English sentences. In: Lindlom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 287–300. Academic Press, New York (1979)
Google Scholar
Möbius, B., Santen, P.H.J.: Modeling Segmental duration in German Text-to-Speech Synthesis. In: 4th International Conference on Spoken Language Processing (ICSLP), pp. 2395–2398 (1996)
Google Scholar
Takeda, K., Sagisaka, Y., Kuwabara, H.: On sentence-level factors governing segmental duration in Japanese. Journal of Acoustic Society of America 6(86), 2081–2087 (1989)
Article Google Scholar
Santen, J.P.H.: Contextual effects on vowel durations. Speech Communication 11, 513–546 (1992)
Article Google Scholar
Campbell, W.N.: Syllable based segment duration. In: Bailly, G., Benoit, C., Sawallis, T.R. (eds.) Talking Machines: Theories, Models and Designs, pp. 211–224. Elsevier, Amsterdam (1992)
Google Scholar
Goubanova, O., King, S.: Bayesian network for phone duration prediction. Speech Communication 50, 301–311 (2008)
Article Google Scholar
Lazaridis, A., Zervas, P., Kokkinakis, G.: Segmental Duration Modeling for Greek Speech Synthesis. In: 19th IEEE International Conference of Tools with Artificial Intelligence (ICTAI), pp. 518–521 (2007)
Google Scholar
Jiang, D.N., Zhang, W., Shen, L., Cai, L.H.: Prosody Analysis and Modeling for Emotional Speech Synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), pp. 281–284 (2005)
Google Scholar
Inanoglu, Z., Young, S.: Data-driven emotion conversion in spoken English. Speech Communication 51, 268–283 (2009)
Article Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: 9th International Conference on Machine Learning (ICML), pp. 249–256 (1992)
Google Scholar
Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishing, San Francisco (2005)
MATH Google Scholar
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: 9th European Conf. on Machine Learning, University of Economics, Faculty of Informatics and Statistics, pp. 128–137 (1997)
Google Scholar
Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992)
Google Scholar
Kääriäinen, M., Malinen, T.: Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees. Journal of Machine Learning Research 5, 1107–1126 (2004)
Google Scholar
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Journal of Machine Learning 6, 37–66 (1991)
Google Scholar
Atkeson, C.G., Moorey, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1996)
Article Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Statist. Data Anal. 4(38), 367–378 (2002)
Article Google Scholar
Breiman, L.: Bagging Predictors. Journal of Machine Learning 2(24), 123–140 (1996)
Google Scholar
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, New Zealand (1999)
Google Scholar
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)
MATH Google Scholar
Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 641–644 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, Rion-Patras, 26500, Greece
Alexandros Lazaridis, Todor Ganchev, Iosif Mporas, Theodoros Kostoulas & Nikos Fakotakis

Authors

Alexandros Lazaridis
View author publications
You can also search for this author in PubMed Google Scholar
Todor Ganchev
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Mporas
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros Kostoulas
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics and Telecommunications, NCSR Demokritos, Ag. Paraskevi, 15310, Athens, Greece
Stasinos Konstantopoulos , Stavros Perantonis , Vangelis Karkaletsis & Constantine D. Spyropoulos , , &
Department of Information and Communication Systems Engineering, University of the Aegean, 83200, Karlovassi, Samos, Greece
George Vouros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lazaridis, A., Ganchev, T., Mporas, I., Kostoulas, T., Fakotakis, N. (2010). Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-12842-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12841-7
Online ISBN: 978-3-642-12842-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics