Recognition of Visual Speech Elements Using Hidden Markov Models

Say Wei Foo³ &
Liang Dong⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2532))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

337 Accesses
6 Citations

Abstract

In this paper, a novel subword lip reading system using continuous Hidden Markov Models (HMMs) is presented. The constituent HMMs are configured according to the statistical features of lip motion and trained with the Baum-Welch method. The performance of the proposed system in identifying the fourteen visemes defined in MPEG-4 standards is addressed. Experiment results show that an average accuracy above 80% can be achieved using the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Optimizing Phoneme-to-Viseme Mapping for Continuous Lip-Reading in Spanish

References

H. McGurk and J. MacDonald: Hearing lips and seeing voices, Nature, (1976) 748–756
Google Scholar
W. Sumby and I. Pollack: Visual contributions to speech intelligibility in noise, J. Acoust. Soc. Amer. (1954)
Google Scholar
M. Kass, A. Witkin and D. Terzopoulus: Snakes: Active contour models, International Journal of Computer Vision, (1988) 321–331
Google Scholar
Tsuhan Chen and Ram R. Rao: audio-visual Integration in Multimodal Communication, Proc. IEEE, Vol. 86, No.5, (1998) 837–852
Article Google Scholar
C. Bregler and S. Omohundro: Nonlinear manifold learning for visual speech recognition, Proc. IEEE ICCV, (1995) 494–499
Google Scholar
Alan L. Yuille, David S. Cohen and Peter W. Hallinan: Feature extraction from faces using deformable templates, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (1989) 104–109
Google Scholar
M. E. Hennecke, K. V. Prasad and D. G. Stork: Using deformable templates to infer visual speech dynamics, Technical report, Ricoh California Research Center, (1994)
Google Scholar
L. R. Rabiner: A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, Vol. 77, No. 2, (1989) 257–286
Article Google Scholar
Y. Wu, A. Ganapathiraju and J. Picone: Report for Baum-Welch Re-estimation of Hidden Markov Model, Institute for Signal and Information Processing, (1999)
Google Scholar
M. Tekalp and J. Ostermann: Face and 2-D mesh animation in MPEG-4, Image Communication J. (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore
Say Wei Foo
Department of Electrical and Computer Engineering, National University of Singapore, 119260, Singapore
Liang Dong

Authors

Say Wei Foo
View author publications
You can also search for this author in PubMed Google Scholar
Liang Dong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
Yung-Chang Chen
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Long-Wen Chang & Chiou-Ting Hsu &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Foo, S.W., Dong, L. (2002). Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, YC., Chang, LW., Hsu, CT. (eds) Advances in Multimedia Information Processing — PCM 2002. PCM 2002. Lecture Notes in Computer Science, vol 2532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36228-2_75

Download citation

DOI: https://doi.org/10.1007/3-540-36228-2_75
Published: 16 December 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00262-8
Online ISBN: 978-3-540-36228-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Recognition of Visual Speech Elements Using Hidden Markov Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Optimizing Phoneme-to-Viseme Mapping for Continuous Lip-Reading in Spanish

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Recognition of Visual Speech Elements Using Hidden Markov Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Optimizing Phoneme-to-Viseme Mapping for Continuous Lip-Reading in Spanish

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation