Vision Based Speech Animation Transferring with Underlying Anatomical Structure

Yuru Pei¹⁹ &
Hongbin Zha¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3851))

Included in the following conference series:

Asian Conference on Computer Vision

1628 Accesses

Abstract

We present a novel method to transfer speech animation recorded in low resolution videos onto realistic 3D facial models. Unsupervised learning is utilized on a speech video corpus to find underlying manifold of facial configurations. K-means clustering is applied on the low dimensional space to find key speaking-related facial shapes. With a small set of laser scanner captured 3D models related to the clustering centroid, the facial animation in 2D videos is transferred onto 3D shapes. Especially by virtue of a weak perspective projection model, the underlying mandible rotation is recovered from videos and is utilized to drive 3D skull movements. The adaption of a generic skull onto facial models is guided by a 2D image, Tissue Map. With parsimonious data requirements, our system realizes the animation transferring and gains a realistic rendering effect with the underlying anatomical structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech-Driven Facial Animation Using Manifold Relevance Determination

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

Realistic Speech-Driven Facial Animation with GANs

Article Open access 13 October 2019

References

Ezzat, I., Geiger, G., Poggio, T.: Trainable videorealistic speech animation. ACM Transactions on Graphics 21, 388–398 (2002)
Article Google Scholar
Chai, J., Xiao, J., Hodgins, J.: Vision-based control of 3d facial animation. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 193–206. Eurographics Association Aire-la-Ville, San Diego (2003)
Google Scholar
Allen, B., Curless, B., Popovic, Z.: The space of all body shapes: Reconstruction and parameterization from range scans. In: Proc. ACM SIGGRAPH, San Diego, CA, pp. 587–594. Addison-Wesley, San Diego (2003)
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video rewrite: Driving visual speech with audio. In: Proc. ACM SIGGRAPH, Los Angeles, CA, pp. 353–360. ACM Press/Addison-Wesley Publishing Co., Los Angeles (1997)
Google Scholar
Brand, M.: Voice puppetry. In: Proc. ACM SIGGRAPH, Los Angeles, CA, pp. 21–28. ACM Press/Addison-Wesley Publishing Co., Los Angeles (1999)
Google Scholar
Cao, Y., Faloutsos, P., Kohler, E., Pighin, F.: Real-time speech motion synthesis from recorded motions. In: Proc. ACM SIGGRAPH/Eurographics Symp. on Computer Animation, Grenoble, France, pp. 347–355 (2004)
Google Scholar
Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. ACM Transactions on Graphics 24, 426–433 (2005)
Article Google Scholar
Albrecht, I., Haber, J., Kahler, K., Schroder, M., Seidel, H.-P.: May i talk to you? facial animation from text. In: Proc. tenth Pacific Conference on Computer Graphics and Applications, pp. 77–86. IEEE Computer Society Press, Beijing (2002)
Chapter Google Scholar
Lee, Y., Terzopoulos, D., Waters, K.: Realistic modeling for facial animations. In: Proc. ACM SIGGRAPH 1995, pp. 55–62. ACM Press, Los Angeles (1995)
Google Scholar
Koch, R.M., Gross, M.H., Carls, F.R., Buren, D.F., Fankhauser, G., Parish, Y.I.H.: Simulating facial surgery using finite element methods. In: Proc. ACM SIGGRAPH 1996, pp. 421–428. ACM Press, New Orleans (1996)
Google Scholar
Sifakis, E., Neverov, I., Fedkiw, R.: Automatic determination of facial muscle activations from sparse motion capture marker data. ACM Transactions on Graphics 24, 426–433 (2005)
Article Google Scholar
Jolliffe, I. (ed.): Principal Component Analysis. Springer, New York (1986)
Google Scholar
Pyun, H., Kim, Y., Chae, W., Kang, H.Y., Shin, S.Y.: An example-based approach for facial expression cloning. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 167–176 (2003)
Google Scholar
Chuang, E.S., Deshpande, H., Bregler, C.: Facial expression space learning. In: Proc. 10th Pacific Conference on Computer Graphics and Applications, pp. 68–76. IEEE Computer Society, Beijing (2002)
Chapter Google Scholar
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications, Beverly Hills (1978)
Google Scholar
Cao, Y., Faloutsos, P., Pighin, F.: Unsupervised learning for speech motion editing. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, San Diego, CA, pp. 225–231 (2003)
Google Scholar
Hyvarinen, A., Karhunen, J., Oja, E. (eds.): Independent Component Analysis. John Wiley Sons, New York (2001)
Google Scholar
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Juan, C., Bodenheimer, B.: Cartoon textures. In: Proc. ACM SIGGRAPH/ Eurographics Symp. on Computer Animation, Grenoble, France, pp. 267–276 (2004)
Google Scholar
Hu, C., Chang, Y., Feris, R., Turk, M.: Manifold based analysis of facial expression. In: Proc. Computer Vision and Pattern Recognition Workshop, p. 81. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Wang, Y., Huang, X., Lee, C.S., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., Huang, P.: High resolution acquisition, learning and transfer of dynamic 3-d facial expressions. In: Proc. Annual Conf. of the European Association for Computer Graphics, Grenoble, France, pp. 677–686 (2004)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Proc. 5th European Conference on Computer Vision, Freiburg, Germany, pp. 484–498. Springer, Heidelberg (1998)
Google Scholar
Hatze, H.: High-precision three-dimensional photo- grammetric calibration and object space reconstruction using a modified dlt-approach. J. Biomechanics 21, 533–538 (1988)
Article Google Scholar
Pei, Y., Zha, H.: Transferring speech video onto 3d realistic human faces. In: Proc. thirteenth Pacific Conference on Computer Graphics and Applications, Macao, P.R.China, pp. 13–15 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory on Machine Perception, Peking University, Beijing, P.R. China
Yuru Pei & Hongbin Zha

Authors

Yuru Pei
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Zha
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Institute of Information Technology, Center for Visual Information Technology, Hyderabad, India
P. J. Narayanan
Department of Computer Science, Columbia University, 500 West 120th Street, 10027, New York, NY, USA
Shree K. Nayar
Microsoft Research Asia, Beijing, P.R. China
Heung-Yeung Shum

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pei, Y., Zha, H. (2006). Vision Based Speech Animation Transferring with Underlying Anatomical Structure. In: Narayanan, P.J., Nayar, S.K., Shum, HY. (eds) Computer Vision – ACCV 2006. ACCV 2006. Lecture Notes in Computer Science, vol 3851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11612032_60

Download citation

DOI: https://doi.org/10.1007/11612032_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31219-2
Online ISBN: 978-3-540-32433-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics