Shape Feature Analysis for Visual Speech and Speaker Recognition

Jiaping Gui² &
Shilin Wang²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 226))

Included in the following conference series:

International Conference on Applied Informatics and Communication

1601 Accesses

Abstract

Visual information is always combined as a complementary source to enhance the understanding of what the speaker is talking about, especially in a noisy environment. This paper researches on different lip features for visual speech and speaker recognition, and their robustness to different uttering habits is conducted in-depth analysis. Five feature candidates extracted from lip shape are tested and compared on a multispeaker visual speech recognition task of isolated English digits (0~9). Our experimental results demonstrate that the rotational angle caused by head pose is highly correlated with the individual speaker, but independent of the content of speech. The best shape features for speech and speaker recognition are considered to be those providing the “dynamic” information, like rotation and lip motion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition

Lip-Geometry Feature-based Visual Digit Recognition

Visual Speech Recognition with Selected Boundary Descriptors

References

McGurk, H., McDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)
Article Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
Article Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)
Article Google Scholar
Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)
Google Scholar
Mattews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of visual features for lipreading. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(2), 198–213 (2002)
Article Google Scholar
Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)
Google Scholar
Cetingul, H.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Discriminative analysis of lip motion features for speaker identification and speech-reading. IEEE Transactions on Image Processing 15, 2879–2891 (2006)
Article MATH Google Scholar
Leung, S.H., Wang, S.L., Lau, W.H.: Lip Image segmentation using fuzzy clustering incorporating an elliptic shape function. IEEE Trans. Image Process. 13(1), 51–62 (2004)
Article Google Scholar
Sum, K.L., Lau, W.H., Leung, S.H., Liew, A.W.W., Tse, K.W.: A new optimization procedure for extracting the point-based lip contour using active shape model. In: Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1485–1488 (2001)
Google Scholar
http://en.wikipedia.org/wiki/Optical_flow
Lucas, B.D., Kanade, T.: An iterative technique of image registration and its application to stereo. In: Proc. 7th Int. Joint Conf. on Artificial Intelligence, pp. 674–679 (August 1981)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Security Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Jiaping Gui & Shilin Wang

Authors

Jiaping Gui
View author publications
You can also search for this author in PubMed Google Scholar
Shilin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Suzhou University, No. 50 Donghuan Road, 215021, China
Jianwei Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gui, J., Wang, S. (2011). Shape Feature Analysis for Visual Speech and Speaker Recognition. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23235-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-23235-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23234-3
Online ISBN: 978-3-642-23235-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics