Abstract
In this paper, we summarize the main achievements made in the 4-year PUMS project during 2003-2007. The emphasis is on the practical implementations, how we have moved from Matlab and Praat scripting to C/C++ implemented applications in Windows, UNIX, Linux and Symbian environments, with the motivation to enhance technology transfer. We summarize how the baseline methods have been implemented in practice, how the results are utilized in forensic applications, and compare recognition results to the state-ofart and existing commercial products such as ASIS, FreeSpeech and VoiceNet.
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-642-02312-5_25
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Brummer, N., Burget, L., Cernocky, J., Glembek, O., Grezl, F., Karafiat, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006. IEEE Trans. Audio, Speech and Language Processing 15(7), 2072–2084 (2007)
Burget, L., Matejka, P., Schwarz, P., Glembek, O., Cernocky, J.H.: Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System. IEEE Trans. Audio, Speech and Language Processing 15(7), 1979–1986 (2007)
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006)
ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)
Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I.: Implementing speaker recognition system: from Matlab to practice. Research Report A-2007-4, Dept. of Comp. Science, Univ. of Joensuu, Finland (November 2007), http://cs.joensuu.fi/sipu/pub.htm
Hautamäki, V., Kinnunen, T., Kärkkäinen, I., Saastamoinen, J., Tuononen, M., Fränti, P.: Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Processing Letters 15, 162–165 (2008)
Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 645–650 (2007)
ITU, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU-T Recommendation G.729-Annex B (1996)
Kay, S.M.: Fundamentals of Statistical Signal Processing, Detection Theory, vol. 2. Prentice Hall, Englewood Cliffs (1998)
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 16(5), 980–988 (2008)
Kinnunen, T., Gonzalez-Hautamäki, R.: Long-Term F0 Modeling for Text-Independent Speaker Recognition. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 567–570 (October 2005)
Kinnunen, T., Karpov, E., Fränti, P.: Real-time speaker identification and verification. IEEE Trans. on Audio, Speech and Language Processing 14(1), 277–288 (2006)
Kinnunen, T., Hautamäki, V., Fränti, P.: On the use of long-term average spectrum in automatic speaker recognition. In: Huo, Q., Ma, B., Chng, E.-S., Li, H. (eds.) ISCSLP 2006. LNCS, vol. 4274, pp. 559–567. Springer, Heidelberg (2006)
Kinnunen, T., Chernenko, E., Tuononen, M., Fränti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 556–561 (2007)
Kinnunen, T., Saastamoinen, J., Hautamäki, V., Vinni, M., Fränti, P.: Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognition Letters (accepted)
Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26(4), 283–297 (1998)
Ma, B., Zhu, D., Tong, R., Li, H.: Speaker Cluster based GMM tokenization for speaker recognition. In: Proc. Interspeech 2006, Pittsburg, USA, pp. 505–508 (September 2006)
Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: 2nd Baltic Conf. on Human Language Technologies (HLT 2005), Tallinn, Estonia, pp. 317–322 (April 2005)
Ramirez, J., Segura, J.C., Benitez, C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Communications 42(34), 271–287 (2004)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10(1), 19–41 (2000)
Saastamoinen, J., Karpov, E., Hautamäki, V., Fränti, P.: Accuracy of MFCC based speaker recognition in series 60 device. Journal of Applied Signal Processing (17), 2816–2827 (2005)
Saastamoinen, J., Fiedler, Z., Kinnunen, T., Fränti, P.: On factors affecting MFCC-based speaker recognition accuracy. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 503–506 (October 2005)
Tong, R., Ma, B., Lee, K.A., You, C.H., Zhou, D.L., Kinnunen, T., Sun, H.W., Dong, M.H., Ching, E.S., Li, H.Z.: Fusion of acoustic and tokenization features for speaker recognition. In: 5th In. Symp. on Chinese Spoken Language Proc., Singapore, pp. 566–577 (2006)
Tuononen, M., González Hautamäki, R., Fränti, P.: Automatic voice activity detection in different speech applications. In: Int. Conf. on Forensic Applications and Techniques in Telecommunications, Information and Multimedia (e-Forensics 2008), Adelaide, Australia, Article No.12 (January 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I. (2009). Developing Speaker Recognition System: From Prototype to Practical Application. In: Sorell, M. (eds) Forensics in Telecommunications, Information and Multimedia. e-Forensics 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 8. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02312-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-02312-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02311-8
Online ISBN: 978-3-642-02312-5
eBook Packages: Computer ScienceComputer Science (R0)