Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1768226.1768237guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

SVMs for automatic speech recognition: a survey

January 2007
Pages 190 - 216
Published: 01 January 2007 Publication History

Abstract

Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.

References

[1]
H. Sakoe, R. Isotani, K. Yoshida, K. Iso, and T. Watanabe. Speaker-Independent Word Recognition using Dynamic Programming Neural Networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 439-442, Glasgow, Scotland, 1989.
[2]
K. Iso and T. Watanabe. Speaker-Independent Word Recognition using a Neural Prediction Model. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 441-444, Alburquerque, New Mexico (USA), 1990.
[3]
J. Tebelskis, A. Waibel, B. Petek, and O. Schmidbauer. Continuous Speech Recognition using Predictive Neural Networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 61-64, Toronto, Canada, 1991.
[4]
H. Bourlard and N. Morgan. Connectionist speech recognition: a hybrid approach. Boston: Kluwer Academic, Norwell, MA (USA), 1994.
[5]
B. Schlkopf and A. Smola. Learning with kernels. MIT Press, Cambridge, MA (USA), 2002.
[6]
V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
[7]
V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.
[8]
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing, 37:328-339, 1989.
[9]
T. Robinson and F. Fallside. A recurrent error propagation network speech recognition system. Computer, Speech and Language, 5:259-274, 1991.
[10]
E. Trentin and M. Gori. A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing, 37:91-126, 2001.
[11]
H. Bourlard and N. Morgan. Continuous speech recognition by connectionist statistical methods. IEEE Transactions on Neural Networks, 4:893-909, 1993.
[12]
T. Robinson, M. Hochberg, and S. Renals. Automatic Speech and Speaker Recognition - Advanced Topics, chapter The Use of Recurrent Neural Networks in Continuous Speech Recognition (Chapter 19), pages 159-184. Kluwer Academic Publishers, Norwell, MA (USA), 1995.
[13]
W. Reichl and G. Ruske. A hybrid rbf-hmm system for continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3335-3338, Detroit, MI (USA), 1995.
[14]
D. Ellis, R. Singh, and S. Sivadas. Tandem-acoustic modeling in large-vocabulary recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 517-520, Salt Lake City, Utah (USA), 2001.
[15]
B.E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Computational Learning Theory, pages 144-152, 1992.
[16]
F. Pérez-Cruz and O. Bousquet. Kernel Methods and Their Potential Use in Signal Processing. IEEE Signal Processing Magazine, 21(3):57-65, 2004.
[17]
R. Fletcher. Practical Methods of Optimization. Wiley-Interscience, New York, NY (USA), 1987.
[18]
A. Navia-Vázquez, F. Pérez-Cruz, A. Artés-Rodríguez, and A.R. Figueiras-Vidal. Weighted Least Squares Training of Support Vector Classifiers leading to Compact and Adaptive Schemes. IEEE Transactions on Neural Networks, 12(5):1047-1059, 2001.
[19]
S. Fine, J. Navratil, and R.A. Gopinath. A hybrid gmm/svm approach to speaker identification. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 1, pages 417-420, Salt Lake City, Utah (USA), 2001.
[20]
Q. Le and S. Bengio. Client Dependent GMM-SVM Models for Speaker Verification. In International Conference on Artificial Neural Networks, ICANN/ICONIP, Springer-Verlag, pages 443-451, 2003.
[21]
C. Ma, M.A. Randolph, and J. Drish. A support vector machines-based rejection technique for speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 1, pages 381-384, Salt Lake City, Utah (USA), 2001.
[22]
C.W. Hsu and C.J. Lin. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13(2):415-425, 2002.
[23]
A. Ganapathiraju, J.E. Hamaker, and J. Picone. Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 52:2348- 2355, 2004.
[24]
N. Thubthong and B. Kijsirikul. Support vector machines for thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9:803-13, 2001.
[25]
P. Clarkson and P.J. Moreno. On the use of support vector machines for phonetic classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 2, pages 585-588, Phoenix, Arizona (USA), 1999.
[26]
C. Sekhar, W.F. Lee, K. Takeda, and F. Itakura. Acoustic modelling of subword units using support vector machines. In Workshop on spoken language processing, Mumbai, India, 2003.
[27]
S. Young. HTK-Hidden Markov Model Toolkit (ver 2.1). Cambridge University, 1995.
[28]
J.M. García-Cabellos, C. Peláez-Moreno, A. Gallardo-Antolín, F. Pérez-Cruz, and F. Díaz-de-María. SVM Classifiers for ASR: A Discusion about Parameterization. In Proceedings of EUSIPCO 2004, pages 2067-2070, Wien, Austria, 2004.
[29]
A. Ech-Cherif, M. Kohili, A. Benyettou, and M. Benyettou. Lagrangian support vector machines for phoneme classification. In Proceedings of the 9th International Conference on Neural Information Processing (ICONIP '02), volume 5, pages 2507-2511, Singapore, 2002.
[30]
D. Martín-Iglesias, J. Bernal-Chaves, C. Peláez-Moreno, A. Gallardo-Antolín, and F. Díaz-de-María. Nonlinear Analyses and Algorithms for Speech Processing, volume LNAI 3817 of Lecture Notes in Computer Science, chapter A Speech Recognizer based on Multiclass SVMs with HMM-Guided Segmentation, pages 256-266. Springer, 2005.
[31]
R. Solera-Ureña, D. Martín-Iglesias, A. Gallardo-Antolín, C. Peláez-Moreno, and F. Díaz-de-María. Robust ASR using Support Vector Machines. Speech Communication, Elsevier (submitted), 2006.
[32]
S.V. Gangashetty, C. Sekhar, and B. Yegnanarayana. Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In Proceedings of the International Conference on Intelligent Sensing and Information Processing, pages 387-391, Chennai, India, 2005.
[33]
H. Shimodaira, K.I. Noma, M. Nakai, and S. Sagayama. Support vector machine with dynamic time-alignment kernel for speech recognition. In Proceedings of Eurospeech, pages 1841-1844, Aalborg, Denmark, 2001.
[34]
H. Shimodaira, K. Noma, and M. Nakai. Advances in Neural Information Processing Systems 14, volume 2, chapter Dynamic Time-Alignment Kernel in Support Vector Machine, pages 921-928. MIT Press, Cambridge, MA (USA), 2002.
[35]
L. R. Rabiner, A.E. Rosenberg, and S.E. Levinson. Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(6):575-582, 1978.
[36]
J. R. Glass. A probabilistic framework for segment-based speech recognition. Computer Speech and Language, 17:137-152, 2003.
[37]
T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. Technical report, Dept. of Computer Science, Univ. of California, 1998.
[38]
N.D. Smith and M.J.F. Gales. Using SVMs and discriminative models for speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 1, pages 77-80, Orlando, Florida (USA), 2002.
[39]
N.D. Smith and M.J.F. Gales. Advances in Neural Information Processing Systems 14, volume 14, chapter Speech recognition using SVMs, pages 1197-1204. MIT Press, Cambridge, MA (USA), 2002.
[40]
N.D. Smith and M. Niranjan. Data-dependent Kernels in SVM Classification of Speech Patterns. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), volume 1, pages 297-300, Beijing, China, 2000.
[41]
V. Wan and S. Renals. Speaker verification using sequence discriminant support vector machines. IEEE Transactions on Speech and Audio Processing, 13:203-210, 2005.
[42]
A. Ganapathiraju, J. Hamaker, and J. Picone. Hybrid SVM/HMM Architectures for Speech Recognition. In Proceedings of the 2000 Speech Transcription Workshop, volume 4, pages 504-507, Maryland (USA), May 2000.
[43]
J. Padrell-Sendra, D. Martín-Iglesias, and F. Díaz-de-María. Support vector machines for continuous speech recognition. In Proceedings of the 14th European Signal Processing Conference, Florence, Italy, 2006.
[44]
S. J. Young, N. H. Russell, and J. H. S. Thornton. Token Passing: a Conceptual Model for Connected Speech Recognition Systems. Technical report, CUED Cambridge University, 1989.
[45]
Piero Cosi. Hybrid HMM-NN architectures for connected digit recognition. In Proceedings of the International Joint Conference on Neural Networks, volume 5, pages 85-90, 2000.
[46]
A. Juneja and C. Espy-Wilson. Segmentation of continuous speech using acousticphonetic parameters and statistical learning. In Proceedings of the 9th International Conference on Neural Information Processing, (ICONIP '02), volume 2, pages 726-730, 2002.
[47]
Ch. Chih-Chung and L. Chih-Jen. LIBSVM: a library for support vector machines, 2004.
[48]
J. C. Platt. Advances in Kernel Methods: Support Vector Learning, chapter Fast Training of Support Vector Machines Using Sequential Minimal Optimization, pages 185-208. MIT Press, Cambridge, MA (USA), 1999.
[49]
J. C. Platt. Advances in Large Margin Classifiers, chapter Probabilities for SV Machines, pages 61-74. MIT Press, 1999.
[50]
T. F. Wu, C. J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research, 5:975-1005, 2004.
[51]
C.J.C. Burges. Simplified support vector decision rules. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 71-77, Bari, Italy, 1996.
[52]
E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In IEEE Workshop on Neural Networks for Signal Processing, pages 276-285, Amelia Island, Florida (USA), 1997.
[53]
D. Gutiérrez, E. Parrado, and A. Navia. Mega-GSVC: Training SVMs with Millions of Data. In Proceedings of the Learning'04 International Conference, 2004.
[54]
E. Parrado, J. Arenas, I. Mora, A. Figueiras, and A. Navia. Growing Support Vector Classifiers with Controlled Complexity. Pattern Recognition, 36:1479-1488, 2003.

Cited By

View all
  • (2018)Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiersNeural Computing and Applications10.1007/s00521-016-2470-x29:3(637-651)Online publication date: 1-Feb-2018
  • (2017)HMM/SVM segmentation and labelling of Arabic speech for speech recognition applicationsInternational Journal of Speech Technology10.5555/3135535.313555920:3(563-573)Online publication date: 1-Sep-2017
  • (2009)Single-class support vector machine for an out-of-vocabulary rejection of isolated wordsProceedings of the 2009 international conference on Robotics and biomimetics10.5555/1819998.1820275(1376-1380)Online publication date: 19-Dec-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide books
Progress in nonlinear speech processing
January 2007
269 pages
ISBN:9783540715030
  • Editors:
  • Yannis Stylianou,
  • Marcos Faundez-Zanuy,
  • Anna Esposito

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2007

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiersNeural Computing and Applications10.1007/s00521-016-2470-x29:3(637-651)Online publication date: 1-Feb-2018
  • (2017)HMM/SVM segmentation and labelling of Arabic speech for speech recognition applicationsInternational Journal of Speech Technology10.5555/3135535.313555920:3(563-573)Online publication date: 1-Sep-2017
  • (2009)Single-class support vector machine for an out-of-vocabulary rejection of isolated wordsProceedings of the 2009 international conference on Robotics and biomimetics10.5555/1819998.1820275(1376-1380)Online publication date: 19-Dec-2009

View Options

View options

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media