Abstract
In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniques, we investigate the practical usefulness of two feature selection techniques, namely the Relief and the Correlation-based Feature Selection (CFS) algorithms, for improving the accuracy of phone duration modeling. The feature selection is performed over a large set of phonetic, morphologic and syntactic features. In the experiments, we employed phone duration models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, trained on a Modern Greek speech database of emotional speech, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration modeling regardless of the type of machine learning algorithm used for phone duration modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers, Dodrecht (1997)
Klatt, D.H.: Synthesis by rule of segmental durations in English sentences. In: Lindlom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 287–300. Academic Press, New York (1979)
Möbius, B., Santen, P.H.J.: Modeling Segmental duration in German Text-to-Speech Synthesis. In: 4th International Conference on Spoken Language Processing (ICSLP), pp. 2395–2398 (1996)
Takeda, K., Sagisaka, Y., Kuwabara, H.: On sentence-level factors governing segmental duration in Japanese. Journal of Acoustic Society of America 6(86), 2081–2087 (1989)
Santen, J.P.H.: Contextual effects on vowel durations. Speech Communication 11, 513–546 (1992)
Campbell, W.N.: Syllable based segment duration. In: Bailly, G., Benoit, C., Sawallis, T.R. (eds.) Talking Machines: Theories, Models and Designs, pp. 211–224. Elsevier, Amsterdam (1992)
Goubanova, O., King, S.: Bayesian network for phone duration prediction. Speech Communication 50, 301–311 (2008)
Lazaridis, A., Zervas, P., Kokkinakis, G.: Segmental Duration Modeling for Greek Speech Synthesis. In: 19th IEEE International Conference of Tools with Artificial Intelligence (ICTAI), pp. 518–521 (2007)
Jiang, D.N., Zhang, W., Shen, L., Cai, L.H.: Prosody Analysis and Modeling for Emotional Speech Synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), pp. 281–284 (2005)
Inanoglu, Z., Young, S.: Data-driven emotion conversion in spoken English. Speech Communication 51, 268–283 (2009)
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: 9th International Conference on Machine Learning (ICML), pp. 249–256 (1992)
Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishing, San Francisco (2005)
Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: 9th European Conf. on Machine Learning, University of Economics, Faculty of Informatics and Statistics, pp. 128–137 (1997)
Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992)
Kääriäinen, M., Malinen, T.: Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees. Journal of Machine Learning Research 5, 1107–1126 (2004)
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Journal of Machine Learning 6, 37–66 (1991)
Atkeson, C.G., Moorey, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1996)
Friedman, J.H.: Stochastic gradient boosting. Comput. Statist. Data Anal. 4(38), 367–378 (2002)
Breiman, L.: Bagging Predictors. Journal of Machine Learning 2(24), 123–140 (1996)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, New Zealand (1999)
Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)
Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 641–644 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lazaridis, A., Ganchev, T., Mporas, I., Kostoulas, T., Fakotakis, N. (2010). Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-12842-4_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12841-7
Online ISBN: 978-3-642-12842-4
eBook Packages: Computer ScienceComputer Science (R0)