Combined X-ray and facial videos for phoneme-level articulator dynamics

Hui Chen¹,
Lan Wang¹,
Wenxi Liu¹ &
…
Pheng-Ann Heng^1,2

524 Accesses
Explore all metrics

Abstract

Dynamic external and internal articulator motions are integrated into a low-cost data-driven three-dimensional talking head in this paper. External and internal articulations are defined and calibrated from the video streams and the videofluoroscopy to a generic 3D talking head model. Three different deformation modes in relation to pronunciation characteristics of muscular soft tissue of lips and tongue, up-down movements of chin and the relatively fixed articulators are set up and integrated. The shape blending functions among segmented phonemes of natural speech input are synthesized in an utterance. Animations of the confusable phonemes and minimal pairs are shown to English teachers and learners for a perception test. The results show that the proposed method can reflect the real situation of phonetic pronunciation realistically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Grauwinkel, K., Dewitt, B., Fagel, S.: Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech. In: Proc. of Interspeech, pp. 706–709 (2007)
Tarabalka, Y., Badin, P., Elisei, F., Bailly, G.: Can you read tongue movements? Evaluation of the contribution of tongue display to speech understanding. In: Proc. of ASSISTH2007, pp. 187–193. Toulouse, France (2007)
Wik, P., Engwall, O.: Looking at tongues—can it help in speech perception? In: Proc. of FONETIK 2008, pp. 57–60 (2008)
Rathinavelu, A., Thiagarajan, H., Rajkuma, A.: Three-dimensional articulator model for speech acquisition by children with hearing loss. In: Universal Access in HCI, Part I, HCII 2007. LNCS, vol. 4554, pp. 786–794 (2007)
Fagel, S., Madany, K.: A 3D virtual head as a tool for speech therapy for children. In: Proc. of Interspeech, pp. 2643–2646, Brisbane (2008)
Tye-Murray, N., Kirk, K.I., Schum, L.: Making typically obscured articulatory activity available to speech-readers by means of videofluoroscopy. NCVS Status Prog. Rep. 4, 41–63 (1993)
Google Scholar
Fagel, S., Clemens, C.: An articulation model for audio-visual speech synthesis—determination, adjustment, evaluation. Speech Commun. 44, 141–154 (2004)
Article Google Scholar
Massaro, D.W., Light, J.: Using visible speech to train perception and production of speech for individuals with hearing loss. J. Speech, Lang. Hear. Res. 47, 304–320 (2004)
Article Google Scholar
Badin, P., Elisei, F., Bailly, G., Tarabalka, Y.: An audio-visual talking head for augmented speech generation: Models and animations based on a real speaker’ articulatory data. In: AMDO 2008. LNCS, vol. 5098, pp. 132–143 (2008)
Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proc. of ACM SIGGRAPH 1997, pp. 353–360 (1997)
Ezzat, T., Poggio, T.: Visual speech synthesis by morphing visemes. Int. J. Comput. Vis. 38, 45–57 (2000)
Article MATH Google Scholar
Kalberer, G.A., Gool, L.V.: Realistic face animation for speech. J. Vis. Comput. Animat. 13, 97–106 (2002)
Article MATH Google Scholar
Ma, J., Cole, R., Pellom, B., Ward, W., Wise, B.: Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data. Comput. Animat. Virtual Worlds 15, 485–500 (2004)
Article Google Scholar
Liu, X., Mao, T., Xia, S., Yu, Y., Wang, Z.: Facial animation by optimized blend shapes from motion capture data. Comput. Animat. Virtual Worlds 19, 235–245 (2008)
Article Google Scholar
Deng, Z., Neumann, U.: Expressive speech animation synthesis with phoneme-level controls. Comput. Graph. Forum 27(8), 2096–2113 (2008)
Article Google Scholar
Fagel, S.: Merging methods of speech visualization. ZAS Pap. Linguist. 40, 19–32 (2005)
Google Scholar
Park, S.Y., Subbarao, M.: A multiview 3D modeling system based on stereo vision techniques. J. Mach. Vis. Appl. 16(3), 148–156 (2005)
Article Google Scholar
Jin, X., Li, Y., Peng, Q.: General constrained deformations based on generalized metaballs. Comput. Graph. 24, 219–231 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institute of Advanced Integration Technology, Chinese Academy of Sciences/The Chinese University of Hong Kong, SIAT, Shenzhen, 518055, China
Hui Chen, Lan Wang, Wenxi Liu & Pheng-Ann Heng
Department of Computer Science & Engineering, The Chinese University of Hong Kong, Shatin N.T., Hong Kong
Pheng-Ann Heng

Authors

Hui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pheng-Ann Heng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Wang, L., Liu, W. et al. Combined X-ray and facial videos for phoneme-level articulator dynamics. Vis Comput 26, 477–486 (2010). https://doi.org/10.1007/s00371-010-0434-1

Download citation

Published: 08 April 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s00371-010-0434-1

Combined X-ray and facial videos for phoneme-level articulator dynamics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Method for Constructing 3D Geometric Articulatory Models

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Multi-modal recording and modeling of vocal tract movements

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Combined X-ray and facial videos for phoneme-level articulator dynamics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Method for Constructing 3D Geometric Articulatory Models

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Multi-modal recording and modeling of vocal tract movements

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation