Abstract
In the age of speech and voice recognition technologies, sign language recognition is an essential part of ensuring equal access for deaf people. To date, sign language recognition research has mostly ignored facial expressions that arise as part of a natural sign language discourse, even though they carry important grammatical and prosodic information. One reason is that tracking the motion and dynamics of expressions in human faces from video is a hard task, especially with the high number of occlusions from the signers’ hands. This paper presents a 3D deformable model tracking system to address this problem, and applies it to sequences of native signers, taken from the National Center of Sign Language and Gesture Resources (NCSLGR), with a special emphasis on outlier rejection methods to handle occlusions. The experiments conducted in this paper validate the output of the face tracker against expert human annotations of the NCSLGR corpus, demonstrate the promise of the proposed face tracking framework for sign language data, and reveal that the tracking framework picks up properties that ideally complement human annotations for linguistic research.
Similar content being viewed by others
Notes
Glosses are representations of the signs by their closest English equivalent in all capital letters.
The exact number is dependent on the dimension of the parameter space; higher dimensions reduce the percentage.
Carol Neidle, personal communication.
References
Bauer B, Kraiss K.-F.: Video-based sign recognition using self-organizing subunits. In: International Conference on Pattern Recognition (2002)
Blake A., Isard M.: Active Contours : The Application of Techniques from Graphics, Vision, Control Theory and Statistics to Visual Tracking of Shapes in Motion. Springer, Berlin (1999)
Blanz V., Vetter T.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH, pp 187–194 (1999)
Brandão B., Wainer J., Goldenstein S.: Subspace hierarchical particle filter. In: Brazilian Symposium in Computer Graphics and Image Processing (SIBGRAPI) (2006)
Canny J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986)
Canzler U., Kraiss K.-F.: Person-adaptive facial feature analysis for an advanced wheelchair user-interface. In: Drews, P. (ed.) Conference on Mechatronics and Robotics, vol. 3, pp. 871–876. Sascha Eysoldt Verlag, Aachen (2004)
Cootes T.F., Edwards G.J., Taylor C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Cootes T.F., Taylor C.J.: Active shape models—their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
DeCarlo D., Metaxas D., Stone M.: An anthropometric face model using variational techniques. In: Proceedings of the SIGGRAPH, pp. 67–74 (1998)
Dimitrijevic M., Ilic S., Fua P.: Accurate face models from uncalibrated and ill-lit video sequences. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1034–1041 (2004)
Ding L., Martinez A.M.: Three-dimensional shape and motion reconstruction for the analysis of american sign language. In: Proceedings of IEEE Workshop on Vision for Human–computer Interaction (V4HCI), (2006)
Dinges D.F., Rider R.L., Dorrian J., McGlinchey E.L., Rogers N.L., Cizman Z., Goldenstein S.K., Vogler C., Venkataraman S., Metaxas D.N.: Optical computer recognition of facial expressions associated with stress induced by performance demands. Aviation, Space and Environmental Medicine 76(6 Suppl):B172–B182 (2005)
Essa I., Pentland A.: Coding, analysis, interpretation and recognition of facial expressions. IEEE PAMI 19(7) (1997)
Fang G., Gao W., Chen X., Wang C., Ma J.: Signer-independent continuous sign language recognition based on SRN/HMM. In: Wachsmuth I., Sowa T. (eds.) Gesture and Sign Language in Human–computer Interaction. International Gesture Workshop, vol. 2298, pp. 76–85 Lecture Notes in Artificial Intelligence, Springer, Berlin (2001)
Fischler M., Bolles R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Goldenstein S.: A gentle introduction to predictive filters. Revista de Informatica Teórica e Aplicada XI(1), 61–89 (2004)
Goldenstein S., Vogler C., Metaxas D.: Statistical Cue Integration in DAG Deformable Models. IEEE Trans. Pattern Anal. Mach. Intell. 25(7), 801–813 (2003)
Goldenstein S., Vogler C., Metaxas D.: 3D facial tracking from corrupted movie sequences. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2004)
Goldenstein S., Vogler C., Velho L.: Adaptive deformable models. In: Proceedings of SIBGRAPI, pp. 380–387 (2004)
Goldenstein S., Vogler C.: When occlusions are outliers. In: Proceedings of IEEE Workshop on 25 Years of RANSAC (2006)
Kass M., Witkin A., Terzopoulos D.: Snakes: Active Contour Models. Int. J. Comput. Vis. 1, 321–331 (1988)
Metaxas D.: Physics-based Deformable Models: Applications to Computer Vision, Graphics and Medical Imaging. Kluwer Academic Publishers, Dordrecht (1996)
Murase H., Nayar S.K.: Visual learning and recognition of 3-d objects from appearance. Int. J. Comput. Vis. 14, 5–24 (1995)
Neidle C., Kegl J., MacLaughlin D., Bahan B., Lee R.G.: The syntax of American Sign Language. Language, Speech, and Communication. MIT, Cambridge (2000)
Neidle C., Sclaroff S.: Data collected at the National Center for Sign Language and Gesture Resources, Boston University, under the supervision of C. Neidle and S. Sclaroff. Available online at http://www.bu.edu/asllrp/ncslgr.html, (2002)
Pighin F., Szeliski R., Salesin D.: Resynthesizing facial animation through 3D model-based tracking. In: Proceedings of International Conference of Computer Vision, pp 143–150 (1999)
Romdhani A., Vetter T.: Efficient, robust and accurate fitting of a 3D morphable model. In: Proceedings of International Conference of Computer Vision, pp 59–66 (2003)
Rousseeuw P.J., Van Driessen K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Samaras D., Metaxas D., Fua P., Leclerc Y.G.: Variable albedo surface reconstruction from stereo and shape from shading. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp 480–487 (2000)
Shi J., Tomasi C.: Good features to track. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp 593–600 (1994)
Simoncelli E.: Handbook of Computer Vision and Applications, vol. II, chapter Bayesian Multi-scale Differential Optical Flow, pp. 397–422. Academic, Dublin (1999)
Stolfi J., Figueiredo L.: Self-validated Numerical Methods and Applications. 21° Colóquio Brasileiro de Matemática, IMPA (1997)
Tao H., Huang T.: Visual estimation and compression of facial motion parameters: elements of a 3D model-based video coding system. Int. J. Comput. Vis. 50(2), 111–125 (2002)
Vogler C., Goldenstein S., Stolfi J., Pavlovic V., Metaxas D.: Outlier rejection in high-dimensional deformable models. Image Vis. Comput. (2006, in press)
Vogler C., Metaxas D.: Handshapes and movements: multiple-channel ASL recognition. In: Volpe G., et al. (eds) Proceedings of the Gesture Workshop, vol. 2915 Lecture Notes in Artificial Intelligence, pp 247–258, Springer, Berlin 2004
von Agris U., Schneider D., Zieren J., Kraiss K.-F.: Rapid signer adaptation for isolated sign language recognition. In: Proceedings of IEEE Workshop on Vision for Human–computer Interaction (V4HCI) (2006)
Wen Z., Huang T.: Capturing subtle facial motions in 3D face tracking. In: Proceedings of International Conference of Computer Vision, pp. 1343–1350 (2003)
Zhu Z., Ji Q.: Robust real-time face pose and facial expression recovery. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2006)
Zieren J., Kraiss K.-F.: Robust person-independent visual sign language recognition. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis IbPRIA. Volume Lecture Notes in Computer Science (2005)
Acknowledgments
The research in this paper was supported by NSF CNS-0427267, research scientist funds by the Gallaudet Research Institute, NASA Cooperative Agreements 9-58 with the National Space Biomedical Research Institute, CNPq PQ-301278/2004-0, FAEPEX-Unicamp 1679/04, and FAPESP. Carol Neidle provided helpful advice and discussion on the NCSLGR annotations vis-a-vis the tracking results. Lana Cook, Ben Bahan, and Mike Schlang were the subjects in the video sequences discussed in this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vogler, C., Goldenstein, S. Facial movement analysis in ASL. Univ Access Inf Soc 6, 363–374 (2008). https://doi.org/10.1007/s10209-007-0096-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-007-0096-6