Abstract
Dynamic external and internal articulator motions are integrated into a low-cost data-driven three-dimensional talking head in this paper. External and internal articulations are defined and calibrated from the video streams and the videofluoroscopy to a generic 3D talking head model. Three different deformation modes in relation to pronunciation characteristics of muscular soft tissue of lips and tongue, up-down movements of chin and the relatively fixed articulators are set up and integrated. The shape blending functions among segmented phonemes of natural speech input are synthesized in an utterance. Animations of the confusable phonemes and minimal pairs are shown to English teachers and learners for a perception test. The results show that the proposed method can reflect the real situation of phonetic pronunciation realistically.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Grauwinkel, K., Dewitt, B., Fagel, S.: Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech. In: Proc. of Interspeech, pp. 706–709 (2007)
Tarabalka, Y., Badin, P., Elisei, F., Bailly, G.: Can you read tongue movements? Evaluation of the contribution of tongue display to speech understanding. In: Proc. of ASSISTH2007, pp. 187–193. Toulouse, France (2007)
Wik, P., Engwall, O.: Looking at tongues—can it help in speech perception? In: Proc. of FONETIK 2008, pp. 57–60 (2008)
Rathinavelu, A., Thiagarajan, H., Rajkuma, A.: Three-dimensional articulator model for speech acquisition by children with hearing loss. In: Universal Access in HCI, Part I, HCII 2007. LNCS, vol. 4554, pp. 786–794 (2007)
Fagel, S., Madany, K.: A 3D virtual head as a tool for speech therapy for children. In: Proc. of Interspeech, pp. 2643–2646, Brisbane (2008)
Tye-Murray, N., Kirk, K.I., Schum, L.: Making typically obscured articulatory activity available to speech-readers by means of videofluoroscopy. NCVS Status Prog. Rep. 4, 41–63 (1993)
Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis—determination, adjustment, evaluation. Speech Commun. 44, 141–154 (2004)
Massaro, D.W., Light, J.: Using visible speech to train perception and production of speech for individuals with hearing loss. J. Speech, Lang. Hear. Res. 47, 304–320 (2004)
Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audio-visual talking head for augmented speech generation: Models and animations based on a real speaker’ articulatory data. In: AMDO 2008. LNCS, vol. 5098, pp. 132–143 (2008)
Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proc. of ACM SIGGRAPH 1997, pp. 353–360 (1997)
Ezzat, T., Poggio, T.: Visual speech synthesis by morphing visemes. Int. J. Comput. Vis. 38, 45–57 (2000)
Kalberer, G.A., Gool, L.V.: Realistic face animation for speech. J. Vis. Comput. Animat. 13, 97–106 (2002)
Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)
Liu, X., Mao, T., Xia, S., Yu, Y., Wang, Z.: Facial animation by optimized blend shapes from motion capture data. Comput. Animat. Virtual Worlds 19, 235–245 (2008)
Deng, Z., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. Forum 27(8), 2096–2113 (2008)
Fagel, S.: Merging methods of speech visualization. ZAS Pap. Linguist. 40, 19–32 (2005)
Park, S.Y., Subbarao, M.: A multiview 3D modeling system based on stereo vision techniques. J. Mach. Vis. Appl. 16(3), 148–156 (2005)
Jin, X., Li, Y., Peng, Q.: General constrained deformations based on generalized metaballs. Comput. Graph. 24, 219–231 (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, H., Wang, L., Liu, W. et al. Combined X-ray and facial videos for phoneme-level articulator dynamics. Vis Comput 26, 477–486 (2010). https://doi.org/10.1007/s00371-010-0434-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-010-0434-1