Abstract
This paper proposes novel features based on linear prediction of temporal phase (LPTP) for speaker recognition task. The proposed LPTC feature vector represents Discrete Cosine Transform (DCT) (for energy compaction and decorrelation) coefficients of LP spectrum derived from temporal phase of speech signal. The results are shown on standard NIST 2002 SRE and GMM-UBM (Gaussian Mixture Modeling-Universal Background Modeling) approach. A recently proposed supervised score-level fusion method is used for combining evidences of Mel Frequency Cepstral Coefficients (MFCC) and proposed feature set. Performance of proposed feature set is compared with state-of-the-art MFCC features. It is evident from the results that proposed features gives 4% improvement in % identification rate and 2% decrement in % EER than that of standard MFCC alone. In addition, when the supervised score-level fusion is used, identification rate improves 8% and EER is decreased by 2% indicating that proposed feature captures complimentary information than MFCC alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. ASSP 28, 357–366 (1980)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Li, Q., Huang, Y.: Robust speaker identification using an auditory based feature. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), Dallas, Texas, pp. 4514–4517 (2010)
Shi, G., Shanechi, M.M., Aarabi, P.: On the importance of phase in human speech recognition. Symp. IEEE Trans. Audio, Speech Lang. process. 14, 1867–1874 (2006)
Hegde, R.M., Murthy, H.A., Rao, G.V.R.: Application of the modified group delay function to speaker identification and discrimination. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), Montreal, Canada, pp. 517–520 (2004)
Vijayan, K., Kumar, V., Murthy, K.S.R: Feature extraction from analytic phase of speech signal. In: Proceedings of INTERSPEECH, Singapore, pp. 1658–1662 (2014)
Murty, K.S.R., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Sig. Process. Lett. 13, 52–56 (2006)
Agrawal, P., Patil, H.A.: Fusion of TEO phase with MFCC features for speaker recognition. In: Proceedings of the 2nd International Conference on Perception and Machine Intelligence (PerMin), pp. 161–166 (2015)
Vijayan, K., Kumar, V., Murty, K.S.R.: Allpass modelling of Fourier phase for speaker verication. In: Proceedings of ODYSSEY: The Speaker and Language Recognition Workshop, Joensuu, Finland, pp. 112–117 (2014)
Vijayan K., Murthy, K.S.R.: Epoch Extraction From Allpass residual of speech signals. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1493–1497 (2014)
NIST 2002 Speaker Recognition Evaluation. http://www.nist.gov/speech/tests/spk/2002/. Last Accessed 20 Apr 2017
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Proc. 10, 19–41 (2000)
MSR identity toolkit, Microsoft research (2013). http://research.microsoft.com/. Last Accessed 28 Mar 2017
Pelaez-Moreno, C., Gallardo-Antolin, A., Diaz-de-Maria, F.: Recognizing voice over IP: a robust front-end for speech recognition on the world wide web. IEEE Trans. Multimedia 3(2), 209–218 (2001)
Martin, A., Doddington, G., Kamm, T., Ordowski, M.: The DET curve in assessment of detection task performance. In: European Conference on Speech Processing Technology, Rhodes, Greece, pp. 1895–1898 (1997)
Mike Brookes: VOICEBOX: Speech Processing Toolbox for MATLAB (2014). http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Last Accessed 2 Apr 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gandhi, A., Patil, H.A. (2017). Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)