A Non-Linear VAD for Noisy Environments

Jordi Solé-Casals¹ &
Vladimir Zaiats¹

158 Accesses
Explore all metrics

Abstract

This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding, based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to unvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

A technique for noise robust voice activity detection under uncontrolled environment

Article 01 August 2024

A robust polynomial regression-based voice activity detector for speaker verification

Article Open access 11 October 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844.
Google Scholar
Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007.
Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0.
Google Scholar
Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.
Article Google Scholar
Jia C, Xu B. An improved entropy-based endpoint detection algorithm. In: Proc ISCSLP 2002, 3rd Int Symp Chinese Spoken Lang Process, Beijing; 2002. http://www.colips.org/conference/iscslp2006/anthology/2002/Papers/096.PDF. Accessed 3 Apr 2010.
Shin W-H, Lee B-S, Lee Y-K, Lee J-S. Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc ICASSP 2000, IEEE Int Conf Acoust, Speech and Signal Process, Istanbul, Turkey; June 2000. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.1840&rep=rep1&type=pdf. Accessed 3 Apr 2010.
Van Gerven S, Xie F. A comparative study of speech detection methods. In: Kokkinakis G, Fakotakis N, Dermatas E, editors. Eurospeech’97, 5th Europ Conf Speech Comm Tech, Rhodes, Greece; 22–25 Sept 1997. p. 1095–8. ISCA Archive http://www.isca-speech.org/archive/eurospeech_1997/e97_1095.html. Accessed 3 Apr 2010.
Hariharan R, Häkkinen J, Laurila K. Robust end-of-utterance detection for real-time speech recognition applications. In: Proc ICASSP 2001; 2001. p. 249–52. http://ieeexplore.ieee.org/iel5/7486/20365/00940814.pdf. Accessed 3 Apr 2010.
Acero A, Crespo C, De la Torre C, Torrecilla J. Robust HMM-based endpoint detector. In: Eurospeech’93, 3rd Europ Conf Speech Comm Tech, Berlin, Germany; 22–25 Sept 1993. p. 1551–4. http://www.isca-speech.org/archive/eurospeech_1993/e93_1551.html. Accessed 3 Apr 2010.
Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14.
Shen J, Hung J, Lee L. Robust entropybased endpoint detection for speech recognition in noisy environments. In: ICSLP’98, 5th Int Conf Spoken Lang Process, Sydney, Australia; 30 Nov–4 Dec 1998. Paper 0232. http://www.ee.columbia.edu/~dpwe/papers/ShenHL98-endpoint.pdf. Accessed 3 Apr 2010.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948.
Google Scholar
Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.
Article Google Scholar
Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968.
Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3.
Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.
Book Google Scholar
Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.
Article Google Scholar
Solé-Casals J, Monte E, Taleb A, Jutten C. Source separation techniques applied to speech linear prediction. In: ICSLP2000, 6th Int Conf Spoken Lang Process, vol. 4, Beijing, China; 16–20 Oct 2000. p. 680–3. http://www.isca-speech.org/archive/icslp_2000/i00_4680.html. Accessed 3 Apr 2010.
Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990.
Google Scholar
Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60.
Google Scholar
Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188).
Google Scholar
Mathis H, Joho M, Moschytz GS. A simple threshold nonlinearity for blind separation of sub-Gaussian signals. In: ISCAS 2000, IEEE Intl Symp Circuits Syst, Geneva, Switzerland; 28–31 May 2000. p. IV 489–92. http://www.icom.hsr.ch/uploads/media/hmat-joho-gsm-00-iscas.pdf. Accessed 3 Apr 2010.
Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.
Article Google Scholar
ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09).
Solé-Casals J, Monte-Moreno E. Nonlinear prediction based on score function. In: Proc EUPISCO-2002, 11th Europ Signal Process Conf, vol. III, Toulouse, France; 3–6 Sept 2002. p. 533–6. http://www.eurasip.org/Proceedings/Eusipco/2002/articles/paper707.pdf. Accessed 3 Apr 2010.
Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.
Article Google Scholar
Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006.
Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5.

Download references

Acknowledgments

This work has been supported by the University of Vic under grants R0904, R0912, and by the Ministry of Science and Innovation of Spain (MICINN) under grant TEC2008-02717-E/TEC. The authors thank two anonymous referees for helpful comments that have leaded to improvement of the paper.

Author information

Authors and Affiliations

Digital Technologies Group, University of Vic, Sagrada Família 7, 08500, Vic, Spain
Jordi Solé-Casals & Vladimir Zaiats

Authors

Jordi Solé-Casals
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Zaiats
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Solé-Casals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Solé-Casals, J., Zaiats, V. A Non-Linear VAD for Noisy Environments. Cogn Comput 2, 191–198 (2010). https://doi.org/10.1007/s12559-010-9037-4

Download citation

Published: 10 April 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s12559-010-9037-4

A Non-Linear VAD for Noisy Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

A technique for noise robust voice activity detection under uncontrolled environment

A robust polynomial regression-based voice activity detector for speaker verification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Non-Linear VAD for Noisy Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Voice Activity Detector in Presence of Non-stationary Noise

A technique for noise robust voice activity detection under uncontrolled environment

A robust polynomial regression-based voice activity detector for speaker verification

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation