Abstract
This paper deals with non-linear transformations for improving the performance of an entropy-based voice activity detector (VAD). The idea to use a non-linear transformation has already been applied in the field of speech linear prediction, or linear predictive coding, based on source separation techniques, where a score function is added to classical equations in order to take into account the true distribution of the signal. We explore the possibility of estimating the entropy of frames after calculating its score function, instead of using original frames. We observe that if the signal is clean, the estimated entropy is essentially the same; if the signal is noisy, however, the frames transformed using the score function may give entropy that is different in voiced frames as compared to unvoiced ones. Experimental evidence is given to show that this fact enables voice activity detection under high noise, where the simple entropy method fails.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Altmann G. Cognitive models of speech processing: psycholinguistic and computational perspectives. USA: The MIT Press; 1995. ISBN-13: 978-0262510844.
Singh D, Boland F. Voice activity detection, ACM Crossroads 13.4: Computer Vision and Speech. 2007.
Grimm M, Kroschel K, editors. Robust speech recognition and understanding. Vienna, Austria: I-Tech; 2007. ISBN: 987-3-90213-08-0.
Górriz JM, Ramírez J, Segura JC, Puntonet CG. An effective cluster-based model for robust speech detection and speech recognition in noisy environments. J Acoust Soc Amer. 2006;120:470–81.
Jia C, Xu B. An improved entropy-based endpoint detection algorithm. In: Proc ISCSLP 2002, 3rd Int Symp Chinese Spoken Lang Process, Beijing; 2002. http://www.colips.org/conference/iscslp2006/anthology/2002/Papers/096.PDF. Accessed 3 Apr 2010.
Shin W-H, Lee B-S, Lee Y-K, Lee J-S. Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc ICASSP 2000, IEEE Int Conf Acoust, Speech and Signal Process, Istanbul, Turkey; June 2000. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.125.1840&rep=rep1&type=pdf. Accessed 3 Apr 2010.
Van Gerven S, Xie F. A comparative study of speech detection methods. In: Kokkinakis G, Fakotakis N, Dermatas E, editors. Eurospeech’97, 5th Europ Conf Speech Comm Tech, Rhodes, Greece; 22–25 Sept 1997. p. 1095–8. ISCA Archive http://www.isca-speech.org/archive/eurospeech_1997/e97_1095.html. Accessed 3 Apr 2010.
Hariharan R, Häkkinen J, Laurila K. Robust end-of-utterance detection for real-time speech recognition applications. In: Proc ICASSP 2001; 2001. p. 249–52. http://ieeexplore.ieee.org/iel5/7486/20365/00940814.pdf. Accessed 3 Apr 2010.
Acero A, Crespo C, De la Torre C, Torrecilla J. Robust HMM-based endpoint detector. In: Eurospeech’93, 3rd Europ Conf Speech Comm Tech, Berlin, Germany; 22–25 Sept 1993. p. 1551–4. http://www.isca-speech.org/archive/eurospeech_1993/e93_1551.html. Accessed 3 Apr 2010.
Kosmides E, Dermatas E, Kokkinakis G. Stochastic endpoint detection in noisy speech. In: Int Workshop Speech Comp (SPECOM); 1997. p. 109–14.
Shen J, Hung J, Lee L. Robust entropybased endpoint detection for speech recognition in noisy environments. In: ICSLP’98, 5th Int Conf Spoken Lang Process, Sydney, Australia; 30 Nov–4 Dec 1998. Paper 0232. http://www.ee.columbia.edu/~dpwe/papers/ShenHL98-endpoint.pdf. Accessed 3 Apr 2010.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423. 623–656, July, Oct. 1948.
Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf Control. 1959;2:101–12.
Kullback S. Information theory and statistics. Mineola, NY: Dover Publications; 1968.
Verdú S. Mismatched estimation and relative entropy. In: Proc 2009 IEEE Int Symp Inform Theory, vol. 2. Seoul, Korea: Coex; 2009. p. 809–13. ISBN: 978-1-4244-4312-3.
Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York: John Wiley; 2001.
Solé-Casals J, Taleb A, Jutten C. Parametric approach to blind deconvolution of nonlinear channels. Neurocomputing. 2002;48:339–55.
Solé-Casals J, Monte E, Taleb A, Jutten C. Source separation techniques applied to speech linear prediction. In: ICSLP2000, 6th Int Conf Spoken Lang Process, vol. 4, Beijing, China; 16–20 Oct 2000. p. 680–3. http://www.isca-speech.org/archive/icslp_2000/i00_4680.html. Accessed 3 Apr 2010.
Härdle W. Smoothing techniques with implementation in S. Berlin-New York: Springer; 1990.
Ozeki K. The mutual information as a scoring function for speech recognition. IEICE technical report. Speech. 1995;431(95):53–60.
Buldygin VV, Kozachenko YuV. Metric characterization of random variables and stochastic processes. Providence: American Mathematical Society; 2000. (Translations of Mathematical Monographs, vol. 188).
Mathis H, Joho M, Moschytz GS. A simple threshold nonlinearity for blind separation of sub-Gaussian signals. In: ISCAS 2000, IEEE Intl Symp Circuits Syst, Geneva, Switzerland; 28–31 May 2000. p. IV 489–92. http://www.icom.hsr.ch/uploads/media/hmat-joho-gsm-00-iscas.pdf. Accessed 3 Apr 2010.
Cardoso J-F. Blind signal separation: statistical principles. Proc IEEE. 1998;9:2009–25.
ETSI standard doc. ETSI ES 201 108 V1.1.3 (2003-09).
Solé-Casals J, Monte-Moreno E. Nonlinear prediction based on score function. In: Proc EUPISCO-2002, 11th Europ Signal Process Conf, vol. III, Toulouse, France; 3–6 Sept 2002. p. 533–6. http://www.eurasip.org/Proceedings/Eusipco/2002/articles/paper707.pdf. Accessed 3 Apr 2010.
Kim E-K, Han W-J, Oh Y-H. A score function of splitting band for two-band speech model. Speech Commun. 2003;41:663–74.
Kokkinakis K, Nandi AK. Flexible score functions for blind separation of speech signals based on generalized Gamma probability density functions. In: Proc ICASSP 2006, Acoustics, Speech and Signal Processing, vol. 1, 2006.
Chiang T-H, Lin Y-C. An integrated scoring function for a spoken dialogue system. In: Signal Process Proc, 1998. ICSP ’98, 4th Intl Conf Signal Process, vol. 1, Beijing, China; 12–16 Oct 1998. p. 617–20. ISBN: 0-7803-4325-5.
Acknowledgments
This work has been supported by the University of Vic under grants R0904, R0912, and by the Ministry of Science and Innovation of Spain (MICINN) under grant TEC2008-02717-E/TEC. The authors thank two anonymous referees for helpful comments that have leaded to improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Solé-Casals, J., Zaiats, V. A Non-Linear VAD for Noisy Environments. Cogn Comput 2, 191–198 (2010). https://doi.org/10.1007/s12559-010-9037-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-010-9037-4