Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification

Lucas Daniel Terissi¹⁵,
Gonzalo D. Sad¹⁵,
Juan Carlos Gómez¹⁵ &
…
Marianela Parodi¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9423))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2450 Accesses
3 Citations

Abstract

This paper describes an audio-visual speech recognition system based on wavelets and Random Forests. Wavelet multiresolution analysis is used to represent in a compact form the sequence of both acoustic and visual input parameters. Then, recognition is performed using Random Forests classification using the wavelet-based features as inputs. The efficiency of the proposed speech recognition scheme is evaluated over two audio-visual databases, considering acoustic noisy conditions. Experimental results show that a good performance is achieved with the proposed system, outperforming the efficiency of traditional Hidden Markov Model-based approaches. The proposed system has only one tuning parameter, however, experimental results also show that this parameter can be selected within a small range without significantly changing the recognition results.

Download to read the full chapter text

Chapter PDF

Robust front-end for audio, visual and audio–visual speech classification

Article 13 April 2018

Complementary models for audio-visual speech classification

Article 07 January 2022

Class Confusability Reduction in Audio-Visual Speech Recognition Using Random Forests

Keywords

References

Advanced Multimedia Processing Laboratory. Carnegie Mellon University, Pittsburgh, PA. http://chenlab.ece.cornell.edu/projects/AudioVisualSpeechProcessing/
Borgström, B., Alwan, A.: A low-complexity parabolic lip contour model with speaker normalization for high-level feature extraction in noise-robust audiovisual speech recognition. IEEE Transactions on Systems, Man and Cybernetics 38(6), 1273–1280 (2008)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 26(2), 123–140 (1996)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Pennsylvania (1992)
Book MATH Google Scholar
NOISEX-92 database. Digital Signal Processing (DSP) group. Rice University, Houston, TX
Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2(3), 141–151 (2000)
Article Google Scholar
Estellers, V., Gurban, M., Thiran, J.: On dynamic stream weighting for audio-visual speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(4), 1145–1157 (2012)
Article Google Scholar
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding 108(1–2), 116–134 (2007)
Article Google Scholar
Krishnamurthy, N., Hansen, J.: Babble noise: Modeling, analysis, and applications. IEEE Transactions on Audio, Speech, and Language Processing 17(7), 1394–1407 (2009)
Article Google Scholar
Papandreou, G., Katsamanis, A., Pitsikalis, V., Maragos, P.: Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition. Transactions on Audio, Speech, and Language Processing 17(3), 423–435 (2009)
Article Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91, 1306–1326 (2003)
Article Google Scholar
Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated korean word recognition. Pattern Recognition 44(3), 559–571 (2011)
Article MATH Google Scholar
Shivappa, S., Trivedi, M., Rao, B.: Audiovisual information fusion in human computer interfaces and intelligent environments: A survey. Proceedings of the IEEE 98(10), 1692–1715 (2010)
Article Google Scholar
Terissi, L.D., Sad, G., Gómez, J.C., Parodi, M.: Noisy speech recognition based on combined audio-visual classifiers. Lecture Notes in Computer Science 8869, 43–53 (2015)
Article Google Scholar
Zhao, G., Barnard, M., Pietikäinen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for System Dynamics and Signal Processing, Universidad Nacional de Rosario CIFASIS-CONICET, Rosario, Argentina
Lucas Daniel Terissi, Gonzalo D. Sad, Juan Carlos Gómez & Marianela Parodi

Authors

Lucas Daniel Terissi
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo D. Sad
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Marianela Parodi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Daniel Terissi .

Editor information

Editors and Affiliations

Univ. Católica del Uruguay, Montevideo, Uruguay
Alvaro Pardo
University of Surrey, Guildford, United Kingdom
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Terissi, L.D., Sad, G.D., Gómez, J.C., Parodi, M. (2015). Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification. In: Pardo, A., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2015. Lecture Notes in Computer Science(), vol 9423. Springer, Cham. https://doi.org/10.1007/978-3-319-25751-8_68

Download citation

DOI: https://doi.org/10.1007/978-3-319-25751-8_68
Published: 25 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25750-1
Online ISBN: 978-3-319-25751-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification

Abstract

Chapter PDF

Similar content being viewed by others

Robust front-end for audio, visual and audio–visual speech classification

Complementary models for audio-visual speech classification

Class Confusability Reduction in Audio-Visual Speech Recognition Using Random Forests

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification

Abstract

Chapter PDF

Similar content being viewed by others

Robust front-end for audio, visual and audio–visual speech classification

Complementary models for audio-visual speech classification

Class Confusability Reduction in Audio-Visual Speech Recognition Using Random Forests

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation