Abstract
This paper describes an audio-visual speech recognition system based on wavelets and Random Forests. Wavelet multiresolution analysis is used to represent in a compact form the sequence of both acoustic and visual input parameters. Then, recognition is performed using Random Forests classification using the wavelet-based features as inputs. The efficiency of the proposed speech recognition scheme is evaluated over two audio-visual databases, considering acoustic noisy conditions. Experimental results show that a good performance is achieved with the proposed system, outperforming the efficiency of traditional Hidden Markov Model-based approaches. The proposed system has only one tuning parameter, however, experimental results also show that this parameter can be selected within a small range without significantly changing the recognition results.
Chapter PDF
Similar content being viewed by others
References
Advanced Multimedia Processing Laboratory. Carnegie Mellon University, Pittsburgh, PA. http://chenlab.ece.cornell.edu/projects/AudioVisualSpeechProcessing/
Borgström, B., Alwan, A.: A low-complexity parabolic lip contour model with speaker normalization for high-level feature extraction in noise-robust audiovisual speech recognition. IEEE Transactions on Systems, Man and Cybernetics 38(6), 1273–1280 (2008)
Breiman, L.: Bagging predictors. Machine Learning 26(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Pennsylvania (1992)
NOISEX-92 database. Digital Signal Processing (DSP) group. Rice University, Houston, TX
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2(3), 141–151 (2000)
Estellers, V., Gurban, M., Thiran, J.: On dynamic stream weighting for audio-visual speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(4), 1145–1157 (2012)
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding 108(1–2), 116–134 (2007)
Krishnamurthy, N., Hansen, J.: Babble noise: Modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing 17(7), 1394–1407 (2009)
Papandreou, G., Katsamanis, A., Pitsikalis, V., Maragos, P.: Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition. Transactions on Audio, Speech, and Language Processing 17(3), 423–435 (2009)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91, 1306–1326 (2003)
Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated korean word recognition. Pattern Recognition 44(3), 559–571 (2011)
Shivappa, S., Trivedi, M., Rao, B.: Audiovisual information fusion in human computer interfaces and intelligent environments: A survey. Proceedings of the IEEE 98(10), 1692–1715 (2010)
Terissi, L.D., Sad, G., Gómez, J.C., Parodi, M.: Noisy speech recognition based on combined audio-visual classifiers. Lecture Notes in Computer Science 8869, 43–53 (2015)
Zhao, G., Barnard, M., Pietikäinen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Terissi, L.D., Sad, G.D., Gómez, J.C., Parodi, M. (2015). Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification. In: Pardo, A., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2015. Lecture Notes in Computer Science(), vol 9423. Springer, Cham. https://doi.org/10.1007/978-3-319-25751-8_68
Download citation
DOI: https://doi.org/10.1007/978-3-319-25751-8_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25750-1
Online ISBN: 978-3-319-25751-8
eBook Packages: Computer ScienceComputer Science (R0)