Nothing Special   »   [go: up one dir, main page]

Skip to main content

Developing Speaker Recognition System: From Prototype to Practical Application

  • Conference paper
Forensics in Telecommunications, Information and Multimedia (e-Forensics 2009)

Abstract

In this paper, we summarize the main achievements made in the 4-year PUMS project during 2003-2007. The emphasis is on the practical implementations, how we have moved from Matlab and Praat scripting to C/C++ implemented applications in Windows, UNIX, Linux and Symbian environments, with the motivation to enhance technology transfer. We summarize how the baseline methods have been implemented in practice, how the results are utilized in forensic applications, and compare recognition results to the state-ofart and existing commercial products such as ASIS, FreeSpeech and VoiceNet.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-642-02312-5_25

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)

    Article  Google Scholar 

  2. Brummer, N., Burget, L., Cernocky, J., Glembek, O., Grezl, F., Karafiat, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006. IEEE Trans. Audio, Speech and Language Processing 15(7), 2072–2084 (2007)

    Article  Google Scholar 

  3. Burget, L., Matejka, P., Schwarz, P., Glembek, O., Cernocky, J.H.: Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System. IEEE Trans. Audio, Speech and Language Processing 15(7), 1979–1986 (2007)

    Article  Google Scholar 

  4. Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006)

    Article  Google Scholar 

  5. ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)

    Google Scholar 

  6. Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I.: Implementing speaker recognition system: from Matlab to practice. Research Report A-2007-4, Dept. of Comp. Science, Univ. of Joensuu, Finland (November 2007), http://cs.joensuu.fi/sipu/pub.htm

  7. Hautamäki, V., Kinnunen, T., Kärkkäinen, I., Saastamoinen, J., Tuononen, M., Fränti, P.: Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Processing Letters 15, 162–165 (2008)

    Article  Google Scholar 

  8. Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 645–650 (2007)

    Google Scholar 

  9. ITU, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU-T Recommendation G.729-Annex B (1996)

    Google Scholar 

  10. Kay, S.M.: Fundamentals of Statistical Signal Processing, Detection Theory, vol. 2. Prentice Hall, Englewood Cliffs (1998)

    Google Scholar 

  11. Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 16(5), 980–988 (2008)

    Article  Google Scholar 

  12. Kinnunen, T., Gonzalez-Hautamäki, R.: Long-Term F0 Modeling for Text-Independent Speaker Recognition. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 567–570 (October 2005)

    Google Scholar 

  13. Kinnunen, T., Karpov, E., Fränti, P.: Real-time speaker identification and verification. IEEE Trans. on Audio, Speech and Language Processing 14(1), 277–288 (2006)

    Article  MATH  Google Scholar 

  14. Kinnunen, T., Hautamäki, V., Fränti, P.: On the use of long-term average spectrum in automatic speaker recognition. In: Huo, Q., Ma, B., Chng, E.-S., Li, H. (eds.) ISCSLP 2006. LNCS, vol. 4274, pp. 559–567. Springer, Heidelberg (2006)

    Google Scholar 

  15. Kinnunen, T., Chernenko, E., Tuononen, M., Fränti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 556–561 (2007)

    Google Scholar 

  16. Kinnunen, T., Saastamoinen, J., Hautamäki, V., Vinni, M., Fränti, P.: Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognition Letters (accepted)

    Google Scholar 

  17. Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26(4), 283–297 (1998)

    Article  Google Scholar 

  18. Ma, B., Zhu, D., Tong, R., Li, H.: Speaker Cluster based GMM tokenization for speaker recognition. In: Proc. Interspeech 2006, Pittsburg, USA, pp. 505–508 (September 2006)

    Google Scholar 

  19. Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: 2nd Baltic Conf. on Human Language Technologies (HLT 2005), Tallinn, Estonia, pp. 317–322 (April 2005)

    Google Scholar 

  20. Ramirez, J., Segura, J.C., Benitez, C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Communications 42(34), 271–287 (2004)

    Article  Google Scholar 

  21. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10(1), 19–41 (2000)

    Article  Google Scholar 

  22. Saastamoinen, J., Karpov, E., Hautamäki, V., Fränti, P.: Accuracy of MFCC based speaker recognition in series 60 device. Journal of Applied Signal Processing (17), 2816–2827 (2005)

    Google Scholar 

  23. Saastamoinen, J., Fiedler, Z., Kinnunen, T., Fränti, P.: On factors affecting MFCC-based speaker recognition accuracy. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 503–506 (October 2005)

    Google Scholar 

  24. Tong, R., Ma, B., Lee, K.A., You, C.H., Zhou, D.L., Kinnunen, T., Sun, H.W., Dong, M.H., Ching, E.S., Li, H.Z.: Fusion of acoustic and tokenization features for speaker recognition. In: 5th In. Symp. on Chinese Spoken Language Proc., Singapore, pp. 566–577 (2006)

    Google Scholar 

  25. Tuononen, M., González Hautamäki, R., Fränti, P.: Automatic voice activity detection in different speech applications. In: Int. Conf. on Forensic Applications and Techniques in Telecommunications, Information and Multimedia (e-Forensics 2008), Adelaide, Australia, Article No.12 (January 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I. (2009). Developing Speaker Recognition System: From Prototype to Practical Application. In: Sorell, M. (eds) Forensics in Telecommunications, Information and Multimedia. e-Forensics 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 8. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02312-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02312-5_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02311-8

  • Online ISBN: 978-3-642-02312-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics