Abstract
In this paper we propose a voice activity detection (VAD) algorithm for improving speech recognition performance in noisy environments. The approach is based on statistical tests applied to multiple observation window based on the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach (bispectrum). It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs. Clear improvements in Speech Recognition are obtained when the proposed VAD is used as a part of a ASR system.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11550907_163 .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ETSI. Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. ETSI EN 301 708 Recommendation (1999)
ETSI. Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms. ETSI ES 201 108 Recommendation (2000)
Gustafsson, S., Martin, R., Jax, P., Vary, P.: A psychoacoustic approach to combined acoustic echo cancellation and noise reduction. IEEE Transactions on Speech and Audio Processing 10(5), 245–256 (2002)
Hinich, J.R.: Testing for gaussianity and linearity of a stationary time series. Journal of Time Series Analisys 3, 169–176 (1982)
ITU. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)
Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)
Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)
Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)
Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the lpc residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)
Ramírez, J., Segura, J.C., Benítez, C., delaTorre, A., Rubio, A.: An effective subband osf-based vad with noise reduction for robust speech recognition. In press IEEE Transactions on Speech and Audio Processing 13(6), 1119–1129 (2004)
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)
Subba-Rao, T.: A test for linearity of stationary time series. Journal of Time Series Analisys 1, 145–158 (1982)
Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University, Cambridge (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Górriz, J.M., Ramírez, J., Puntonet, C.G., Theis, F., Lang, E.W. (2005). Bispectrum-Based Statistical Tests for VAD. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550907_85
Download citation
DOI: https://doi.org/10.1007/11550907_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28755-1
Online ISBN: 978-3-540-28756-8
eBook Packages: Computer ScienceComputer Science (R0)