Abstract
Language-independent and alignment-free phonological and phonemic features were applied for automatic age estimation based on voice and speech properties. 110 persons (average: 75.7 years) read the German version of the text “The North Wind and the Sun”. For comparison with the automatic approach, five listeners estimated the speakers’ age perceptually. Support Vector Regression and feature selection were used to compute the best model of aging. This model was found to use the following features: (a) the percentage of voiced frames, (b) eight phonological features, representing vowel height, nasality in consonants, turbulence, and position of the lips, and finally, (c) seven phonemic features. The latter features might be relevant due to altered articulation because of dentures. The mean absolute error between computed and chronological age was 5.2 years (RMSE: 7.0). It was 7.7 years (RMSE: 9.6) for an optimistic trivial estimator and 10.5 years (RMSE: 11.9) for the average listener.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rusz, J., Cmejla, R., Ruzickova, H., Ruzicka, E.: Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 129, 350–367 (2011)
Middag, C., Bocklet, T., Martens, J.-P., Nöth, E.: Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment. In: Proc. Interspeech, ISCA, pp. 3005–3008 (2011)
Middag, C.: Automatic Analysis of Pathological Speech. PhD thesis, Ghent University, Ghent, Belgium (2012)
Haderlein, T., Middag, C., Maier, A., Martens, J.-P., Döllinger, M., Nöth, E.: Visualization of intelligibility measured by language-independent features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 547–554. Springer, Heidelberg (2014)
Schneider, S., Plank, C., Eysholdt, U., Schützenberger, A., Rosanowski, F.: Voice Function and Voice-Related Quality of Life in the Elderly. Gerontology 57, 109–114 (2011)
International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)
Middag, C., Saeys, Y., Martens, J.-P.: Towards an ASR-free objective analysis of pathological speech. In: Proc. Interspeech, ISCA, pp. 294–297 (2010)
Moerman, M., Pieters, G., Martens, J.-P., van der Borgt, M.-J., Dejonckere, P.: Objective evaluation of the quality of substitution voices. Eur. Arch. Otorhinolaryngol. 261, 541–547 (2004)
van Immerseel, L., Martens, J.-P.: AMPEX Disordered Voice Analyzer [computer program]. Digital Speech and Signal Processing research group, Ghent University, Ghent, Belgium. http://dssp.elis.ugent.be/downloads-software (last visited May 28, 2015)
van Immerseel, L.M., Martens, J.-P.: Pitch and voiced/unvoiced determination with an auditory model. J. Acoust. Soc. Am. 91, 3511–3526 (1992)
Smola, A.J., Schölkopf, B.: A Tutorial on Support Vector Regression. Statistics and Computing 14, 199–222 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Harrington, J., Palethorpe, S., Watson, C.I.: Does the Queen speak the Queen’s English? Nature 408, 927–928 (2000)
Watson, P.J., Munson, B.: A comparison of vowel acoustics between older and younger adults. In: Proc. ICPhS XIV, pp. 561–564. International Phonetic Association (2007)
Harrington, J., Palethorpe, S., Watson, C.I.: Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. In: Proc. Interspeech, ISCA, pp. 2753–2756 (2007)
Schötz, S.: Prosodic and non-prosodic cues in human and machine estimation of female and male speaker age. In: Bruce, G., Horne, M. (eds.) Nordic Prosody: Proceedings of the IXth Conference, pp. 215–223. Lund, Sweden (2004)
Spiegl, W., Stemmer, G., Lasarcyk, E., Kolhatkar, V., Cassidy, A., Potard, B., Shum, S., Song, Y.C., Xu, P., Beyerlein, P., Harnsberger, J., Nöth, E.: Analyzing features for automatic age estimation on cross-sectional data. In: Proc. Interspeech, ISCA, pp. 2923–2926 (2009)
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of perceptual age using speaker modeling techniques. In: Proc. Eurospeech, ISCA, pp. 3005–3008 (2003)
Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Haderlein, T. et al. (2015). Language-Independent Age Estimation from Speech Using Phonological and Phonemic Features. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)