Abstract
In this paper, a novel subword lip reading system using continuous Hidden Markov Models (HMMs) is presented. The constituent HMMs are configured according to the statistical features of lip motion and trained with the Baum-Welch method. The performance of the proposed system in identifying the fourteen visemes defined in MPEG-4 standards is addressed. Experiment results show that an average accuracy above 80% can be achieved using the proposed system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. McGurk and J. MacDonald: Hearing lips and seeing voices, Nature, (1976) 748–756
W. Sumby and I. Pollack: Visual contributions to speech intelligibility in noise, J. Acoust. Soc. Amer. (1954)
M. Kass, A. Witkin and D. Terzopoulus: Snakes: Active contour models, International Journal of Computer Vision, (1988) 321–331
Tsuhan Chen and Ram R. Rao: audio-visual Integration in Multimodal Communication, Proc. IEEE, Vol. 86, No.5, (1998) 837–852
C. Bregler and S. Omohundro: Nonlinear manifold learning for visual speech recognition, Proc. IEEE ICCV, (1995) 494–499
Alan L. Yuille, David S. Cohen and Peter W. Hallinan: Feature extraction from faces using deformable templates, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (1989) 104–109
M. E. Hennecke, K. V. Prasad and D. G. Stork: Using deformable templates to infer visual speech dynamics, Technical report, Ricoh California Research Center, (1994)
L. R. Rabiner: A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, Vol. 77, No. 2, (1989) 257–286
Y. Wu, A. Ganapathiraju and J. Picone: Report for Baum-Welch Re-estimation of Hidden Markov Model, Institute for Signal and Information Processing, (1999)
M. Tekalp and J. Ostermann: Face and 2-D mesh animation in MPEG-4, Image Communication J. (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Foo, S.W., Dong, L. (2002). Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, YC., Chang, LW., Hsu, CT. (eds) Advances in Multimedia Information Processing — PCM 2002. PCM 2002. Lecture Notes in Computer Science, vol 2532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36228-2_75
Download citation
DOI: https://doi.org/10.1007/3-540-36228-2_75
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00262-8
Online ISBN: 978-3-540-36228-9
eBook Packages: Springer Book Archive