Bark scaled oversampled WPT based speech recognition enhancement in noisy environments

Navneet Upadhyay^1,2 &
Hamurabi Gamboa Rosales¹

147 Accesses
3 Citations
Explore all metrics

Abstract

The performance of speech recognition system degrades significantly in real-world environment, is a case of the acoustic mismatch between the training and operating conditions. This paper presents a two-stage approach to make a speech recognition system immune to additive and uncorrelated background noise i.e. robust. In the first stage, an oversampled wavelet packet decomposes the entire input noisy speech into seventeen nonlinear frequency subbands like the Bark scale of the human hearing system and the adaptive noise estimation based spectral subtraction filters the noisy speech from each subband signal. The oversampled WPT is linear and advantageous as it causes to overcome the shift-invariance complexity by removing the decimation after the filtering at each decomposition level. In the second stage, a nonparametric approach is used for feature extraction from filtered speech, and the parameters from the feature extraction stage are compared with the parameters extracted from speech signals stored in a template to recognize the utterance. A series of experiments are carried out to evaluate the performance of the proposed two-stage system in a variety of real environments, with and without the use of the first stage. Recognition accuracy is evaluated at the word level in a wide range of SNRs for various types of noisy environments. The experimental results show significant improvement in recognition performance at low SNR using the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

Article 06 August 2024

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Acero, A., & Stern, R. M. (1990). Environmental robustness in automatic speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, USA (Vol. 2, pp. 849–852).
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., et al. (2007). Automatic speech recognition and speech variability: A review. Speech Communication,49, 763–786.
Article Google Scholar
Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In International Conference on Acoustics, Speech, and Signal Processing, Washington, DC, USA (Vol. 4, pp. 208–211).
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction on Speech and Audio Processing,27(2), 113–120.
Google Scholar
Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech, and Audio Processing,11(5), 466–475.
Article Google Scholar
Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2013). Comparative study of automatic speech recognition techniques. IET Signal Processing,7(1), 25–46.
Article Google Scholar
Flores, J. A. N. & Young, S. J. (1993). Adapting a HMM based recognizer for noisy speech enhanced by spectral subtraction. In European conference on speech communication and technology (pp. 829–832).
Gong, Y. (1995). Speech recognition in noisy environments: A survey. Computer Speech & Language,16, 261–291.
MathSciNet Google Scholar
Hirsch, H. G. & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In International conference on spoken language processing, China, Oct 16–20, 2000 (pp. 17–21).
Juang, B. H. (1991). Speech recognition in adverse environments. Computer Speech & Language,5, 275–294.
Article Google Scholar
Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics,33(3), 251–272.
Article MathSciNet Google Scholar
Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In International conference on acoustics, speech, and signal processing, USA, May 2002 (Vol. 4, pp. 4160–4164).
Lin, L., Holmes, W., & Ambikairajah, E. (2002). Speech denoising using perceptual modification of Wiener filtering. Electronics Letters,38(23), 1486–1487.
Article Google Scholar
Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters,39(9), 754–755.
Article Google Scholar
Mallat, S. (1989). A theory for multi-resolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,11(7), 674–693.
Article Google Scholar
Mallat, S. (2009). A wavelet tour of signal processing: The sparse way (3rd ed.). New York: Academic Press.
MATH Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction on Speech and Audio Processing,9(5), 504–512.
Article Google Scholar
Olhede, S., & Walden, A. T. (2005). A generalized demodulation approach to time-frequency projections for multi-component signals. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences,461, 2159–2179.
Article MathSciNet Google Scholar
Pallett, Devid S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards,90(5), 371–385.
Article Google Scholar
Rix, A. R., Beerends, J., Hollier, M., & Hekstra, A. (2001). Perceptual evaluation of speech quality (PESQ): A new method for speech quality assessment of telephone networks and codecs. In Proceedings of IEEE international conference on acoustics, speech, and signal processing, Salt Lake City, UT (Vol. 2, pp. 749–752).
Upadhyay, N., & Karmakar, A. (2014). A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments. International Journal of Speech Technology,17, 117–132.
Article Google Scholar
Upadhyay, N., & Rosales, H. G. (2016). Auditory driven subband speech enhancement for automatic recognition of noisy speech. International Journal of Speech Technology,19(4), 869–880.
Article Google Scholar
Walden, A. T., & Contreras, C. (1998). The phase-corrected undecimated discrete wavelet packet transform and its application to interpreting the timing of events. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences,454, 2243–2266.
Article Google Scholar
Yamada, Takeshi, Kumakura, Masakazu, & Kitawaki, Nobuhiko. (2006). Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2006–2013.
Article Google Scholar
Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. The Journal of the Acoustical Society of America,68, 1523–1525.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Signal Processing and Acoustics, Faculty of Electrical Engineering, Autonomous University of Zacatecas, 98000, Zacatecas, Mexico
Navneet Upadhyay & Hamurabi Gamboa Rosales
Department of Electronics and Communication Engineering, The LNM Institute of Information Technology, Jaipur, 302 031, India
Navneet Upadhyay

Authors

Navneet Upadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Hamurabi Gamboa Rosales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Navneet Upadhyay.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Upadhyay, N., Rosales, H.G. Bark scaled oversampled WPT based speech recognition enhancement in noisy environments. Int J Speech Technol 23, 1–12 (2020). https://doi.org/10.1007/s10772-019-09657-y

Download citation

Received: 27 August 2019
Accepted: 11 November 2019
Published: 26 November 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10772-019-09657-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Bark scaled oversampled WPT based speech recognition enhancement in noisy environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation