Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Linear prediction residual features for automatic speaker verification anti-spoofing

Published: 01 July 2018 Publication History

Abstract

Automatic speaker verification (ASV) systems are highly vulnerable against spoofing attacks. Anti-spoofing, determining whether a speech signal is natural/genuine or spoofed, is very important for improving the reliability of the ASV systems. Spoofing attacks using the speech signals generated using speech synthesis and voice conversion have recently received great interest due to the 2015 edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015). In this paper, we propose to use linear prediction (LP) residual based features for anti-spoofing. Three different features extracted from LP residual signal were compared using the ASVspoof 2015 database. Experimental results indicate that LP residual phase cepstral coefficients (LPRPC) and LP residual Hilbert envelope cepstral coefficients (LPRHEC) obtained from the analytic signal of the LP residual yield promising results for anti-spoofing. The proposed features are found to outperform standard Mel-frequency cepstral coefficients (MFCC) and Cosine Phase (CosPhase) features. LPRPC and LPRHEC features give the smallest equal error rates (EER) for eight spoofing methods out of ten spoofing attacks in comparison to MFCC and CosPhase features.

References

[1]
Bhavsar HN, Patel TB, Patil HA (2016) Novel nonlinear prediction based features for spoofed speech detection. In: Proceedings of INTERSPEECH, pp 155---159
[2]
Bonastre J, Matrouf D, Fredouille C (2007) Artificial impostor voice transformation effects on false acceptance rates. In: Proceedings of INTERSPEECH, pp 2053---2056
[3]
Ergu?nay SK, Khoury E, Lazaridis A, Marcel S (2015) On the vulnerability of speaker verification to realistic voice spoofing. In: Proceedings of BTAS, pp 1---6
[4]
Evans NWD, Kinnunen T, Yamagishi J, Wu Z, Alegre F, Leon PLD (2014) Speaker recognition anti-spoofing. In: Handbook of biometric anti-spoofing - trusted biometrics under spoofing attacks, pp 125---146
[5]
Fukada T, Tokuda K, Kobayashi T, Imai S (1992) An adaptive algorithm for mel-cepstral analysis of speech. In: Proceedings of ICASSP, vol 1, pp 137---140
[6]
Hanilçi C., Kinnunen T, Sahidullah M (2015) Classifiers for synthetic speech detection: a comparison. In: Proceedings of INTERSPEECH, pp 2057---2061
[7]
Hautama?ki RG, Kinnunen T, Hautama?ki V, Leino T, Laukkanen A (2013) I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. In: Proceedings of INTERSPEECH, pp 930---934
[8]
Hautama?ki RG, Kinnunen T, Hautama?ki V, Leino T, Laukkanen A (2015) Automatic versus human speaker verification: the case of voice mimicry. Speech Comm 72:13---31
[9]
Jain AK, Ross A, Pankanti S (2006) Biometrics: a tool for information security. IEEE, Transactions on Information Forensics and Security 1(2):125---143
[10]
Janicki A (2015) Spoofing countermeasure based on analysis of linear prediction error. In: Proceedings of INTERSPEECH, pp 2077---2081
[11]
Janicki A (2017) Increasing anti-spoofing protection in speaker verification using linear prediction. Multimedia Tools and Applications 76(6):9017---9032
[12]
Kawahara H, Masuda-Katsuse I, de Cheveigne A (1999) Restructuring speech representations using a pitch-adaptive time---frequency smoothing and an instantaneous-frequency-based {F0} extraction: Possible role of a repetitive structure in sounds. Speech Communication 27(3---4):187---207
[13]
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52(1):12---40
[14]
Kinnunen T, Saastamoinen J, Hautama?ki V., Vinni M, Fra?nti P (2009) Comparative evaluation of maximum a posteriori vector quantization and gaussian mixture models in speaker verification. Pattern Recogn Lett 30(4):341---347
[15]
Kinnunen T, Wu ZZ, Lee KA, Sedlak F, Chng ES, Li H (2012) Vulnerability of speaker verification systems against voice conversion spoofing attacks: the case of telephone speech. In: Proceedings of ICASSP, pp 4401---4404
[16]
Lavrentyeva G, Novoselov S, Simonchik K (2017) Anti-spoofing methods for automatic speaker verification system. Springer International Publishing, Cham, pp 172---184
[17]
Leon PLD, Apsingekar VR, Pucher M, Yamagishi J (2010) Revisiting the security of speaker verification systems against imposture using synthetic speech. In: Proceedings of ICASSP, pp 1798---1801
[18]
Leon PLD, Pucher M, Yamagishi J (2010) Evaluation of the vulnerability of speaker verification to synthetic speech. In: Proceedings of Odyssey, p 28
[19]
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561---580
[20]
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance. In: Proceedings of EUROSPEECH, pp 1895---1898
[21]
Murty KSR, Yegnanarayana B (2006) Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Process Lett 13(1):52---55
[22]
Nandi D, Pati D, Rao KS (2017) Implicit processing of lp residual for language identification. Comput Speech Lang 41(C):68---87
[23]
Novoselov S, Kozlov A, Lavrentyeva G, Simonchik K, Shchemelinin V (2016) STC anti-spoofing systems for the ASVspoof 2015 challenge, pp 5475---5479
[24]
Pati D, Prasanna SRM (2011) Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. Int J Speech Technol 14(1):49---64
[25]
Rabiner L, Schafer R (2010) Theory and applications of digital speech processing, 1st edn. Prentice Hall Press, Upper Saddle River
[26]
Ratha NK, Connell JH, Bolle RM (2001) Enhancing security and privacy in biometrics-based authentication systems. IBM, Syst J 40(3):614---634
[27]
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1):72---83
[28]
Sahidullah M, Kinnunen T, Hanilçi C. (2015) A comparison of features for synthetic speech detection. In: Proceedings of INTERSPEECH, pp 2087---2091
[29]
Sa?nchez J., Saratxaga I, Herna?ez I., Navas E, Erro D, Raitio T (2015) Toward a universal synthetic speech spoofing detection using phase information. IEEE, Transactions on Information Forensics and Security 10(4):810---820
[30]
Todisco M, Delgado H, Evans N (2016) A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. In: Proceedings of Odyssey, pp 283---290
[31]
Villalba JA, Lleida E (2010) Speaker verification performance degradation against spoofing and tampering attacks. In: Proceedings of FALA, pp 131---134
[32]
Villalba JA, Miguel A, Ortega A, Lleida E (2015) Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge. In: Proceedings of INTERSPEECH, pp 2067---2071
[33]
Wang L, Yoshida Y, Kawakami Y, Nakagawa S (2015) Relative phase information for detecting human speech and spoofed speech. In: Proceedings of INTERSPEECH, pp 2092---2096
[34]
Wu Z, Li H (2014) Voice conversion versus speaker verification: an overview. APSIPA Transactions on Audio Signal and Information Processing 3(e17)
[35]
Wu Z, Siong CE, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Proceedings of INTERSPEECH, pp 1700---1703
[36]
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: a survey. Speech Comm 66:130---153
[37]
Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C., Sahidullah M, Sizov A (2015) ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Proceedings of INTERSPEECH, pp 2037---2041
[38]
Wu Z, Yamagishi J, Kinnunen T, Hanilçi C, Sahidullah M, Sizov A, Evans N, Todisco M (2017) Asvspoof: The automatic speaker verification spoofing and countermeasures challenge. IEEE Journal of Selected Topics in Signal Processing 11(4):588---604
[39]
Xiao X, Tian X, Du S, Xu H, Chng ES, Li H (2015) Spoofing speech detection using high dimensional magnitude and phase features: The NTU approach for ASVspoof 2015 challenge. In: Proceedings of INTERSPEECH, pp 2052---2056

Cited By

View all
  • (2022)A nonlinear prediction model for Chinese speech signal based on RBF neural networkMultimedia Tools and Applications10.1007/s11042-021-11612-681:4(5033-5049)Online publication date: 1-Feb-2022
  • (2020)Voice liveness detection under feature fusion and cross-environment scenarioMultimedia Tools and Applications10.1007/s11042-020-09281-y79:37-38(26951-26967)Online publication date: 19-Jul-2020
  • (2019)Combining evidences from Hilbert envelope and residual phase for detecting replay attacksInternational Journal of Speech Technology10.1007/s10772-019-09604-x22:2(313-326)Online publication date: 1-Jun-2019

Index Terms

  1. Linear prediction residual features for automatic speaker verification anti-spoofing
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Multimedia Tools and Applications
      Multimedia Tools and Applications  Volume 77, Issue 13
      July 2018
      1477 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 July 2018

      Author Tags

      1. Anti-spoofing
      2. Countermeasure
      3. Linear prediction residual
      4. Speaker verification

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)A nonlinear prediction model for Chinese speech signal based on RBF neural networkMultimedia Tools and Applications10.1007/s11042-021-11612-681:4(5033-5049)Online publication date: 1-Feb-2022
      • (2020)Voice liveness detection under feature fusion and cross-environment scenarioMultimedia Tools and Applications10.1007/s11042-020-09281-y79:37-38(26951-26967)Online publication date: 19-Jul-2020
      • (2019)Combining evidences from Hilbert envelope and residual phase for detecting replay attacksInternational Journal of Speech Technology10.1007/s10772-019-09604-x22:2(313-326)Online publication date: 1-Jun-2019

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media