A real-world noise removal with wavelet speech feature

284 Accesses
Explore all metrics

Abstract

Real-world noise signals are non-stationary, and these signals are a mixture of more than one non-stationary noise signal. Most of the conventional speech enhancement algorithms (SEAs) focus primarily on a single noise corrupted speech signal, and it is far from real-world environments. In this article, we discuss speech enhancement in real-world environments with a new speech feature. The novelty of this article is three-fold, (1) The proposed model analyzed in real-world environments. (2) The proposed model uses a discrete wavelet transform (DWT) coefficients as input features. (3) The proposed Deep Denoising Autoencoder (DDAE) designed experimentally. The result of the proposed feature compares with conventional speech features like FFT-Amplitude, Log-Magnitude, Mel frequency cepstral coefficients (MFCCs), and the Gammatone filter cepstral coefficients (GFCCs). The performance of the proposed method compared with conventional speech enhancement methods. The enhanced signal evaluated with speech quality measures, like, perceptual evaluation speech quality (PESQ), weighted spectral slope (WSS), and Log-likelihood ratio (LLR). Similarly, speech intelligibility measured with short-time objective intelligibility (STOI). The results show that the proposed SEA model with the DWT feature improves quality and intelligibility in all real-world environmental Signal-to-Noise ratio (SNR) conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

Article 06 August 2024

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems (pp. 153–160). Cambridge: MIT Press.
Bengio, Y., et al. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.
Article Google Scholar
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Article Google Scholar
Chen, F., Hu, Y., & Yuan, M. (2015). Evaluation of noise reduction methods for sentence recognition by Mandarin-speaking cochlear implant listeners. Ear and Hearing, 36(1), 61–71.
Article Google Scholar
Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with dnn based phase estimation. International Journal of Speech Technology, 22(1), 283–292. https://doi.org/10.1007/s10772-019-09603-y.
Article Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(4), 251–266.
Article Google Scholar
Fu, S. W., Tsao, Y., & Lu, X. (2016). SNR-aware convolutional neural network modeling for speech enhancement. In Interspeech 2016 (pp. 3768–3772).
Garofolo, J. S., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database (Vol. 107, p. 16). Gaithersburgh: National Institute of Standards and Technology (NIST)
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics (ICAIS-13) (pp. 249–256).
Hersbach, A. A., Arora, K., Mauger, S. J., & Dawson, P. W. (2012). Combining directional microphone and single-channel noise reduction algorithms: A clinical evaluation in difficult listening conditions with cochlear implant users. Ear and Hearing, 33(4), 13–23.
Article Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Article MathSciNet Google Scholar
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint. arXiv:12070580.
Hu, Y., & Loizou, P. C. (2006). Evaluation of objective measures for speech enhancement. In Ninth international conference on spoken language processing, Pittsburgh, PA.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Hussain, A., Chetouani, M., Squartini, S., Bastari, A., & Piazza, F. (2007). Nonlinear speech enhancement: An overview. In Progress in nonlinear speech processing (pp. 217–248). Berlin : Springer.
Im, D. J., Ahn, S., Memisevic, R., Bengio, Y., et al. (2017). Denoising criterion for variational auto-encoding framework (pp. 2059–2065). Menlo Park: AAAI.
Google Scholar
Kaisheng, Y., & Zhigang, C. (1998). A robust speech feature-perceptive scalogram based on wavelet analysis. In Fourth international conference on signal processing proceedings. ICSP’98, IEEE, pp 662–665.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint. arXiv:14126980
Kondo, K. (2012). Speech quality. In Subjective quality measurement of speech (pp. 7–20). Cham: Springer.
Kumar, A., & Florencio, D. (2016). Speech enhancement in multiple-noise conditions using deep neural networks. arXiv preprint. arXiv:160502427
Lai, Y. H., Chen, F., Wang, S. S., Lu, X., Tsao, Y., & Lee, C. H. (2017). A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation. IEEE Transactions on Biomedical Engineering, 64(7), 1568–1578.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press.
Book Google Scholar
Lu, X., Tsao, Y., Matsuda, S., & Hori, C. (2013). Speech enhancement based on deep denoising autoencoder. In Interspeech (pp. 436–440).
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.
Article Google Scholar
Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging, 16(4), 049901.
Article MathSciNet Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual Evaluation of Speech Quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing (ICASSP’01) Proceedings (Vol. 2, pp. 749–752).
Seltzer, M. L., Yu, D., & Wang, Y. (2013). An investigation of deep neural networks for noise robust speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7398–7402).
Snyder, D., Chen, G., & Povey, D. (2015). Musan: A music, speech, and noise corpus. arXiv preprint. arXiv:151008484.
Spriet, A., Van Deun, L., Eftaxiadis, K., Laneau, J., Moonen, M., Van Dijk, B., et al. (2007). Speech understanding in backgroundnoise with the two-microphone adaptive beamformer BEAM$^{{\rm TM}}$ inthe nucleus freedom$^{{\rm TM}}$ cochlear implant system. Ear and Hearing, 28(1), 62–72.
Article Google Scholar
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125–2136.
Article Google Scholar
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.
Article Google Scholar
Wang, Y., Narayanan, A., & Wang, D. (2014). On training targets for supervised speech separation. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12), 1849–1858.
Article Google Scholar
Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), 7–19.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology, Roorkee, India
Samba Raju Chiluveru & Manoj Tripathy

Authors

Samba Raju Chiluveru
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Tripathy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samba Raju Chiluveru.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiluveru, S.R., Tripathy, M. A real-world noise removal with wavelet speech feature. Int J Speech Technol 23, 683–693 (2020). https://doi.org/10.1007/s10772-020-09748-1

Download citation

Received: 24 January 2020
Accepted: 12 August 2020
Published: 25 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10772-020-09748-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A real-world noise removal with wavelet speech feature

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Wavelet-Based De-Noising Speech Signal Performance with Objective Measures

Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

Speech Enhancement Based on the Combination of Deep Learning and Wavelet Algorithm

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation