Abstract
The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tanner, D.C., Tanner, M.E.: Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection. Lawyers & Judges Publishing, Tucson (2004)
Pellegrini, T., Hedayati, V., Trancoso, I., Hämäläinen, A., Dias, M.: Speaker age estimation for elderly speech recognition in European Portuguese. In: Proceedings of InterSpeech 2014, Singapore, pp. 2962–2966 (2014)
Moyse, E.: Age estimation from faces and voices: a review. Psychologica Belgica 54, 255–265 (2014)
Braun, A., Cerrato, L.: Estimating speaker age across languages. In: Proceedings of ICPhS 1999, San Francisco, pp. 1369–1372 (1999)
Krauss, R., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. J. Exp. Soc. Psychol. 38, 618–625 (2002)
Amilon, K., van de Weijer, J., Schötz, S.: The impact of visual and auditory cues in age estimation. In: Müller, C. (ed.) Speaker Classification II. Lecture Notes in Computer Science LNCS(LNAI), vol. 4441, pp. 10–21. Springer, Heidelberg (2007)
Moyse, E., Beaufort, A., Brédart, S.: Evidence for an own-age bias in age estimation from voices in older persons. Eur. J. Aging 11, 241–247 (2014)
Bahari, M., McLaren, M., van Hamme, H., van Leeuwen, D.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)
Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information. Comput. Speech Lang. 27, 151–167 (2013)
Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)
Feld, M., Barnard, E., van Heerden, C., Müller, C.: Multilingual spear age recognition: regression analyses on the Lwazi corpus. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 534–539 (2009)
Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE Trans. Audio Speech Lang. Process. 19, 1975–1985 (2011)
Bahari, M., van Hamme, H.: Speaker age estimation and gender detection based on supervised non-negative matrix factorization. In: Proceedings of IEEE Workshop Biometric Measurements and Systems for Security and Medical Applications, pp. 1–6 (2011)
Bahari, M., van Hamme, H.: Speaker age estimation using hidden Markov model weight supervectors. In: IEEE International Conference on Information Science, Signal Processing and their Applications, pp. 517–521 (2012)
Speech Ark, Second Accents of the British Isles Corpus. www.thespeechark.com/abi-2-page.html
Hadfield, J.: MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010)
Eyben, F., Weninger, F., Groß, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelna, Spain, pp. 835–838 (2013)
Schuller, B., Steidl, S., Batliner, A., Epps, J., Eyben, F., Ringeval, F., Marchi, E., Zhang, Y.: The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive and Physical Load. Interspeech 2014, Singapore (2014)
Smola, A., Schölkopf, B.: A tutorial on support vector regression. J. Stat. Comput. 14, 199–222 (2004)
CRAN Project, E1071 package of functions from Dept. Statistics, TU Wein. cran.r-project.org/web/packages/e1071/index.html
Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR abs/1505.01658 (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)
Ardila, A.: Normal aging increases cognitive heterogeneity: analysis of dispersion in WAIS-III scores across age. Arch. Clin. Neuropsychol. 22, 1003–1011 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Huckvale, M., Webb, A. (2015). A Comparison of Human and Machine Estimation of Speaker Age. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)