Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Spectro-temporal Power Spectrum Features for Noise Robust ASR

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this paper, we present a new technique to extract a noise robust representation of speech signals called spectro-temporal power spectrum. This technique is based on applying a simple 2-D filter to the speech spectrogram to highlight the movements of spectral peaks. As speech spectral peaks constitute the regions of high-SNR (signal-to-noise ratio) values in the speech spectrogram, we expect that applying our filter will improve the recognition performance. In addition, by applying the 2-D filter, the spectro-temporal information around each frequency component is encoded into the frequency representation of speech signal. This information will help the recognizer to better identify the true state to which each frame should be allocated. Experimental results on the Aurora 2 task show that error rate improvements of about 40 and 35 % are obtained for test sets A and B, respectively, in comparison with the baseline system when combined with cepstral mean and variance normalization. Also, further improvement was achieved when the proposed features were extracted from enhanced spectra obtained by applying advanced front-end routine. Moreover, phone recognition task evaluated on TIMIT database showed the preference of the proposed method over the baseline methods. The obtained improvement by the proposed method is made with a very simple and easy-to-implement routine which makes it suitable for practical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. J. Bouvrie, T. Ezzat, T. Poggio, Localized spectro-temporal cepstral analysis of speech. in Proceedings on ICASSP (Las Vegas, NV, USA, 2008)

  2. J. Chen, K.K. Paliwal, S. Nakamura, Cepstrum derived from differential power spectrum for robust speech recognition. Speech Commun. 41, 469–484 (2003)

    Article  Google Scholar 

  3. S.-Y. Chang, B.T. Meyer, N. Morgan, Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction. in Proceedings on ICASSP (Vancouver, Canada, 2013)

  4. J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma, Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. J. Neurophysiol. 85, 1220–1234 (2001)

    Google Scholar 

  6. ETSI standard document, Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm, ETSI ES 202 050 v.1.1.5. Nov 2003

  7. G. Farahani, S.M. Ahadi, M.M. Homayounpour, Features based on filtering and spectral peaks in autocorrelation domain for robust speech recognition. Comput. Speech Lang. 21, 187–205 (2007)

    Article  Google Scholar 

  8. S. Ganapathy, S. Thomas, H. Hermansky, Temporal envelope compensation for robust phoneme recognition using modulation spectrum. J. Acoust. Soc. Am. 128, 3769–3780 (2010)

    Article  Google Scholar 

  9. H.A. Gupta, A. Raju, A. Alwan, Non-linear dimension reduction of Gabor features for noise-robust ASR. in Proceedings on ICASSP (Florence, Italy, 2014)

  10. M. Happel, S. Muller, J. Anemueller, F. Ohl, Predictability of STRFs in auditory cortex neurons depends on stimulus class. in Proceedings on Interspeech (Brisbane, Australia, 2008)

  11. H. Hermansky, Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)

    Article  Google Scholar 

  12. H. Hermansky, N. Morgan, Rasta processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)

    Article  Google Scholar 

  13. H.-G. Hirsch, D. Pearce, The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. in Proceedings on ISCA ITRW ASR (Paris, France, 2000)

  14. HTK, The hidden Markov model toolkit (2002). [Online]. Version: HTK 3.2.1 (windows). Available: http://htk.eng.cam.ac.uk

  15. S. Ikbal, H. Bourlard, M. Magimai, HMM/ANN based spectral peak location estimation for noise robust speech recognition. in Proceedings on ICASSP (Philadelphia, PA, USA, 2005)

  16. S. Ikbal, M.M. Doss, H. Misra, H. Bourlard, Spectro-temporal activity pattern (STAP) features for robust ASR. in Proceedings on ICSLP (Jeju Island, South Korea, 2004)

  17. C. Kim, R.M. Stern, Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. in Proceedings on ICASSP (Dallas, Texas, USA, 2010)

  18. M. Kleinschmidt, D. Gelbart, Improving word accuracy with Gabor feature extraction. in Proceedings on Interspeech (Denver, CO, USA, 2002)

  19. M. Marki, Y. Stylianou, Discrimination of speech from nonspeech in broadcast news based on modulation frequency features. Speech Commun. 53(5), 726–735 (2011)

    Article  Google Scholar 

  20. N. Mesgarani, S. Thomas, H. Hermansky, A multistream multiresolution framework for phoneme recognition. in Proceedings on Interspeech (Makuhari, Japan, 2010)

  21. B.T. Meyer, B. Kollmeier, Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition. Speech Commun. 53(5), 753–767 (2011)

    Article  Google Scholar 

  22. B.T. Meyer, S.R. Ravuri, M.R. Scheadler, N. Morgan, Comparing different flavors of spectro-temporal features for ASR. in Proceedings on Interspeech (Florence, Italy, 2011)

  23. B. Meyer, C. Spille, B. Kollmeier, N. Morgan, Hooking up spectro-temporal filters with auditory-inspiring representations for robust automatic speech recognition. in Proceedings on Interspeech (Portland, Oregon, USA, 2012)

  24. S.K. Nemala, K. Patil, M. Elhilali, Multistream bandpass modulation features for robust speech recognition. in Proceedings on Interspeech (Florence, Italy, 2011)

  25. J. Ramirez, J.M. Gorriz, Recent advances in robust speech recognition technology (Bentham Science Publishers, Sharjah, 2011)

    Google Scholar 

  26. S.V. Ravuri, N. Morgan, Easy does it: robust spectro-temporal many-stream ASR without fine tuning streams. in Proceedings on ICASSP (Kyoto, Japan, 2012)

  27. M.R. Schaedler, B.T. Meyer, B. Kollmeier, Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. J. Acoust. Soc. Am. 131, 4134–4151 (2012)

    Article  Google Scholar 

  28. S. Seyedin, S.M. Ahadi, A new subband-weighted MVDR-based front-end for robust speech recognition. IEICE Trans. Inf. Syst. E93–D, 2252–2261 (2010)

    Article  Google Scholar 

  29. S. Seyedin, S.M. Ahadi, S. Gazor, New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition. Scientific World J. 2013, 634160 (2013). doi:10.1155/2013/634160

  30. S. Tiberwala, H. Hermansky, Multi-band and adaptation approaches to robust speech recognition. in Proceedings on Eurospeech (Rhodes, Greece, 1997)

  31. A. Varga, H. Steeneken, M. Tomlinson, J.D., The NOISEX-92 study on the effect of additive noise on automatic speech recognition (Speech Research Unit, Defense Research Agency, Malvern, 1992)

  32. M. Westphal, The use of cepstral means in conversational speech recognition. in Proceedings on Eurospeech (Rhodes, Greece, 1997)

  33. X. Xiao, E.S. Chng, H. Li, Normalization of the speech modulation spectra for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 16, 1662–1674 (2008)

    Article  Google Scholar 

  34. S. Zhao, N. Morgan, Multi-stream spectro-temporal features for robust speech recognition. in Proceedings on Interspeech (Brisbane, Australia, 2008)

  35. S.Y. Zhao, S. Ravuri, N. Morgan, Multi-stream to many-stream: using spectro-temporal features for ASR. in Proceedings ICASSP (Dallas, Texas, USA, 2010)

Download references

Acknowledgements

This work was in part supported by a grant from the Iran Telecommunication Research Center (ITRC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamed Riazati Seresht.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riazati Seresht, H., Ahadi, S.M. & Seyedin, S. Spectro-temporal Power Spectrum Features for Noise Robust ASR. Circuits Syst Signal Process 36, 3222–3242 (2017). https://doi.org/10.1007/s00034-016-0434-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-016-0434-0

Keywords

Navigation