Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition

Ami Gandhi¹⁶ &
Hemant A. Patil¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2291 Accesses
1 Citations

Abstract

This paper proposes novel features based on linear prediction of temporal phase (LPTP) for speaker recognition task. The proposed LPTC feature vector represents Discrete Cosine Transform (DCT) (for energy compaction and decorrelation) coefficients of LP spectrum derived from temporal phase of speech signal. The results are shown on standard NIST 2002 SRE and GMM-UBM (Gaussian Mixture Modeling-Universal Background Modeling) approach. A recently proposed supervised score-level fusion method is used for combining evidences of Mel Frequency Cepstral Coefficients (MFCC) and proposed feature set. Performance of proposed feature set is compared with state-of-the-art MFCC features. It is evident from the results that proposed features gives 4% improvement in % identification rate and 2% decrement in % EER than that of standard MFCC alone. In addition, when the supervised score-level fusion is used, identification rate improves 8% and EER is decreased by 2% indicating that proposed feature captures complimentary information than MFCC alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparative Study on Effect of Temporal Phase for Speaker Verification

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Article 24 February 2015

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Article 09 March 2020

References

Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. ASSP 28, 357–366 (1980)
Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Li, Q., Huang, Y.: Robust speaker identification using an auditory based feature. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), Dallas, Texas, pp. 4514–4517 (2010)
Google Scholar
Shi, G., Shanechi, M.M., Aarabi, P.: On the importance of phase in human speech recognition. Symp. IEEE Trans. Audio, Speech Lang. process. 14, 1867–1874 (2006)
Google Scholar
Hegde, R.M., Murthy, H.A., Rao, G.V.R.: Application of the modified group delay function to speaker identification and discrimination. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process (ICASSP), Montreal, Canada, pp. 517–520 (2004)
Google Scholar
Vijayan, K., Kumar, V., Murthy, K.S.R: Feature extraction from analytic phase of speech signal. In: Proceedings of INTERSPEECH, Singapore, pp. 1658–1662 (2014)
Google Scholar
Murty, K.S.R., Yegnanarayana, B.: Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Sig. Process. Lett. 13, 52–56 (2006)
Article Google Scholar
Agrawal, P., Patil, H.A.: Fusion of TEO phase with MFCC features for speaker recognition. In: Proceedings of the 2nd International Conference on Perception and Machine Intelligence (PerMin), pp. 161–166 (2015)
Google Scholar
Vijayan, K., Kumar, V., Murty, K.S.R.: Allpass modelling of Fourier phase for speaker verication. In: Proceedings of ODYSSEY: The Speaker and Language Recognition Workshop, Joensuu, Finland, pp. 112–117 (2014)
Google Scholar
Vijayan K., Murthy, K.S.R.: Epoch Extraction From Allpass residual of speech signals. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1493–1497 (2014)
Google Scholar
NIST 2002 Speaker Recognition Evaluation. http://www.nist.gov/speech/tests/spk/2002/. Last Accessed 20 Apr 2017
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Proc. 10, 19–41 (2000)
Article Google Scholar
MSR identity toolkit, Microsoft research (2013). http://research.microsoft.com/. Last Accessed 28 Mar 2017
Pelaez-Moreno, C., Gallardo-Antolin, A., Diaz-de-Maria, F.: Recognizing voice over IP: a robust front-end for speech recognition on the world wide web. IEEE Trans. Multimedia 3(2), 209–218 (2001)
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M.: The DET curve in assessment of detection task performance. In: European Conference on Speech Processing Technology, Rhodes, Greece, pp. 1895–1898 (1997)
Google Scholar
Mike Brookes: VOICEBOX: Speech Processing Toolbox for MATLAB (2014). http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Last Accessed 2 Apr 2017

Download references

Author information

Authors and Affiliations

Infinium Solutionz Pvt Ltd, Ahmedabad, India
Ami Gandhi
Dhirubhai Ambani Institute of Information Communication and Technology, Gandhinagar, India
Hemant A. Patil

Authors

Ami Gandhi
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ami Gandhi .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gandhi, A., Patil, H.A. (2017). Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_56

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_56
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study on Effect of Temporal Phase for Speaker Verification

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Novel Linear Prediction Temporal Phase Based Features for Speaker Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study on Effect of Temporal Phase for Speaker Verification

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation