Abstract
In previous work, we have confirmed the performance gains that can be obtained in speaker recognition by splitting the (clean) wide-band speech signal into several subbands, employing separate pattern classifiers for each subband, and then using multiple classifier fusion (‘recombination’) techniques to produce a final decision. However, our earlier work used fairly rudimentary recognition techniques (dynamic time warping), just sum or product fusion rules and the spoken word seven only. The question then arises: Can subband processing still deliver performance gains when using state-of-the-art recognition techniques, more sophisticated recombination, and different spoken digits? To answer this, we have applied hidden Markov modelling and artificial neural network (ANN) recombination to text-dependent speaker identification, for spoken digits seven and nine. We find that ANN recombination performs about as well as the sum rule operating in log probability space, but the ANN results are not unique. They depend critically on user-specified parameters, initialisation, etc. On clean speech, all classifiers achieve close to 100% identification. Subband techniques offer advantages when the speech signal is significantly degraded by noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55, 1304–1312.
Besacier, L. and J.-F. Bonastre (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In Proceedings of 1st International Conference on Audio-and Visual-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, pp. 195–202.
Besacier, L. and J.-F. Bonastre (2000). Subband architecture for automatic speaker recognition. Signal Processing 80, 1245–1259.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford, UK: Clarendon Press.
Bourlard, H. and S. Dupont (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP’96, Volume 1, Philadelphia, PA, pp. 426–429.
Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE 85(9), 1437–1462.
Deller, J. R., J. P. Proakis, and J. H. L. Hansen (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: MacMillan.
Doddington, G. (1985). Speaker recognition–identifying people by their voices. Proceedings of the IEEE 73(11), 1651–1664.
Finan, R. A., R. I. Damper, and A. T. Sapeluk (2001). Text-dependent speaker recognition using sub-band processing. International Journal of Speech Technology 4(1), 45–62.
Furui, S. (1974). An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronic Communications 57-A, 34–42.
Furui, S. (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(2), 254–272.
Furui, S. (1997). Recent advances in speaker recognition. Pattern Recognition Letters 18, 859–872.
Higgins, J. E., R. I. Damper, and C. J. Harris (1999). A multi-spectral data-fusion approach to speaker recognition. In Proceedings of 2nd International Conference on Information Fusion, Fusion 99, Volume II, Sunnyvale, CA, pp. 1136–1143.
Kittler, J., M. Hatef, R. P. W. Duin, and J. Matas (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.
Markel, J. D. and A. H. Gray (1976). Linear Prediction of Speech. Berlin, Germany: Springer-Verlag.
Morris, A., A. Hagen, and H. Bourlard (1999). The full-combination sub-bands approach to noise robust HMM/ANN-based ASR. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 599–602.
Okawa, S., T. Nakajima, and K. Shirai (1999). A recombination strategy for multi-band speech recognition based on mutual information criterion. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 603–606.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285.
Reynolds, D. A. and R. C. Rose (1995). Robust text-independent speaker identification using Gaussian mixture models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83.
Sivakumaran, P., A. M. Ariyaeeinia, and J. A. Hewitt (1998). Sub-band speaker verification using dynamic recombination weights. In Proceedings of 5th International Conference on Spoken Language Processing, ICSLP 98, Sydney, Australia. Paper 1055 on CD-ROM.
Stevens, S. S. and J. Volkmann (1940). The relation of pitch to frequency: A revised scale. American Journal of Psychology 53(3), 329–353.
Tibrewala, S. and H. Hermansky (1997). Sub-band based recognition of noisy speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 97, Volume II, Munich, Germany, pp. 1255–1258.
Young, S., J. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (2000). The HTK Book. Available from URL http://htk.eng.cam.ac.uk/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Higgins, J.E., Dodd, T.J., Damper, R.I. (2001). Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_37
Download citation
DOI: https://doi.org/10.1007/3-540-48219-9_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42284-6
Online ISBN: 978-3-540-48219-2
eBook Packages: Springer Book Archive