Abstract
Visual information is always combined as a complementary source to enhance the understanding of what the speaker is talking about, especially in a noisy environment. This paper researches on different lip features for visual speech and speaker recognition, and their robustness to different uttering habits is conducted in-depth analysis. Five feature candidates extracted from lip shape are tested and compared on a multispeaker visual speech recognition task of isolated English digits (0~9). Our experimental results demonstrate that the rotational angle caused by head pose is highly correlated with the individual speaker, but independent of the content of speech. The best shape features for speech and speaker recognition are considered to be those providing the “dynamic” information, like rotation and lip motion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
McGurk, H., McDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)
Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)
Mattews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of visual features for lipreading. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(2), 198–213 (2002)
Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)
Cetingul, H.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Discriminative analysis of lip motion features for speaker identification and speech-reading. IEEE Transactions on Image Processing 15, 2879–2891 (2006)
Leung, S.H., Wang, S.L., Lau, W.H.: Lip Image segmentation using fuzzy clustering incorporating an elliptic shape function. IEEE Trans. Image Process. 13(1), 51–62 (2004)
Sum, K.L., Lau, W.H., Leung, S.H., Liew, A.W.W., Tse, K.W.: A new optimization procedure for extracting the point-based lip contour using active shape model. In: Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1485–1488 (2001)
Lucas, B.D., Kanade, T.: An iterative technique of image registration and its application to stereo. In: Proc. 7th Int. Joint Conf. on Artificial Intelligence, pp. 674–679 (August 1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gui, J., Wang, S. (2011). Shape Feature Analysis for Visual Speech and Speaker Recognition. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23235-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-23235-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23234-3
Online ISBN: 978-3-642-23235-0
eBook Packages: Computer ScienceComputer Science (R0)