Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3697))

Included in the following conference series:

  • 2808 Accesses

Abstract

In this paper we propose a voice activity detection (VAD) algorithm for improving speech recognition performance in noisy environments. The approach is based on statistical tests applied to multiple observation window based on the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach (bispectrum). It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs. Clear improvements in Speech Recognition are obtained when the proposed VAD is used as a part of a ASR system.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11550907_163 .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. ETSI. Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. ETSI EN 301 708 Recommendation (1999)

    Google Scholar 

  2. ETSI. Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms. ETSI ES 201 108 Recommendation (2000)

    Google Scholar 

  3. Gustafsson, S., Martin, R., Jax, P., Vary, P.: A psychoacoustic approach to combined acoustic echo cancellation and noise reduction. IEEE Transactions on Speech and Audio Processing 10(5), 245–256 (2002)

    Article  Google Scholar 

  4. Hinich, J.R.: Testing for gaussianity and linearity of a stationary time series. Journal of Time Series Analisys 3, 169–176 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  5. ITU. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)

    Google Scholar 

  6. Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)

    Google Scholar 

  7. Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)

    Article  Google Scholar 

  8. Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)

    Article  Google Scholar 

  9. Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)

    Google Scholar 

  10. Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the lpc residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)

    Article  Google Scholar 

  11. Ramírez, J., Segura, J.C., Benítez, C., delaTorre, A., Rubio, A.: An effective subband osf-based vad with noise reduction for robust speech recognition. In press IEEE Transactions on Speech and Audio Processing 13(6), 1119–1129 (2004)

    Article  Google Scholar 

  12. Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)

    Article  Google Scholar 

  13. Subba-Rao, T.: A test for linearity of stationary time series. Journal of Time Series Analisys 1, 145–158 (1982)

    MathSciNet  Google Scholar 

  14. Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)

    Article  Google Scholar 

  15. Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University, Cambridge (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Górriz, J.M., Ramírez, J., Puntonet, C.G., Theis, F., Lang, E.W. (2005). Bispectrum-Based Statistical Tests for VAD. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550907_85

Download citation

  • DOI: https://doi.org/10.1007/11550907_85

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28755-1

  • Online ISBN: 978-3-540-28756-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics