research-article

Production-level facial performance capture using deep convolutional neural networks

Authors:

Shunsuke Saito,

Jaakko LehtinenAuthors Info & Claims

SCA '17: Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation

Article No.: 10, Pages 1 - 10

https://doi.org/10.1145/3099564.3099581

Published: 28 July 2017 Publication History

Abstract

We present a real-time deep learning framework for video-based facial performance capture---the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5--10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.

Supplementary Material

ZIP File (a10-laine.zip)

Supplemental files.

Download
126.18 MB

References

[1]

Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The Digital Emily project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses (SIGGRAPH '09). ACM, New York, NY, USA, Article 12, 15 pages.

Digital Library

[2]

Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality Passive Facial Performance Capture Using Anchor Frames. ACM Trans. Graph. 30, 4, Article 75 (2011), 10 pages

Digital Library

[3]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In Proc. ICML. 41--48.

Digital Library

[4]

Amit Bermano, Thabo Beeler, Yeara Kozlov, Derek Bradley, Bernd Bickel, and Markus Gross. 2015. Detailed Spatio-temporal Reconstruction of Eyelids. ACM Trans. Graph. 34, 4 (2015), 44:1--44:11.

Digital Library

[5]

Kiran S. Bhat, Rony Goldenthal, Yuting Ye, Ronald Mallet, and Michael Koperwas. 2013. High Fidelity Facial Animation Capture and Retargeting with Contours. In Proc. Symposium on Computer Animation. 7--14.

Digital Library

[6]

Bernd Bickel, Manuel Lang, Mario Botsch, Miguel A. Otaduy, and Markus Gross. 2008. Pose-space animation and transfer of facial details. In ACM SCA. 57--66.

[7]

Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proc. ACM SIGGRAPH. 187--194.

Digital Library

[8]

George Borshukov, Dan Piponi, Oystein Larsen, J. P. Lewis, and Christina Tempelaar-Lietz. 2005. Universal capture - image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH Courses (SIGGRAPH '05). ACM, New York, NY, USA, Article 16.

Digital Library

[9]

Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Trans. Graph. 32, 4 (2013), 40:1--40:10.

Digital Library

[10]

Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High resolution passive facial performance capture. ACM TOG 29, Article 41 (2010), 10 pages. Issue 4.

Digital Library

[11]

Matthew Brand. 1999. Voice Puppetry. In Proc. ACM SIGGRAPH. 21--28.

Digital Library

[12]

Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time High-fidelity Facial Performance Capture. ACM Trans. Graph. 34, 4 (2015), 46:1--46:9.

Digital Library

[13]

Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM Trans. Graph. 33, 4 (2014), 43:1--43:10.

Digital Library

[14]

Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D Shape Regression for Real-time Facial Animation. ACM Trans. Graph. 32, 4 (2013), 41:1--41:10.

Digital Library

[15]

Michael M. Cohen and Dominic W. Massaro. 1993. Modeling Coarticulation in Synthetic Visual Speech. In Models and Techniques in Computer Animation. 139--156.

[16]

Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active Appearance Models. IEEE TPAMI 23, 6 (2001), 681--685.

Digital Library

[17]

Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, et al. 2015. Lasagne: First release. (2015).

[18]

Dimensional Imaging. 2016. DI4D PRO System. http://www.di4d.com/systems/di4d-pro-system/. (2016).

[19]

Pif Edwards, Chris Landreth, Eugene Fiume, and Karan Singh. 2016. JALI: An Animator-centric Viseme Model for Expressive Lip Synchronization. ACM Trans. Graph. 35, 4 (2016), 127:1--127:11.

Digital Library

[20]

P. Ekman and W. Friesen. 1978. The Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto.

[21]

Yasutaka Furukawa and Jean Ponce. 2009. Dense 3D motion capture for human faces. In Proc. Computer Vision and Pattern Recognition (CVPR).

[22]

Graham Fyffe, Tim Hawkins, Chris Watts, Wan-Chun Ma, and Paul Debevec. 2011. Comprehensive facial performance capture. In Computer Graphics Forum, Vol. 30. 425--434.

[23]

Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Trans. Graph. 34, 1 (2014), 8:1--8:14.

Digital Library

[24]

Pablo Garrido, Michael Zollhöfer, Chenglei Wu, Derek Bradley, Patrick Pérez, Thabo Beeler, and Christian Theobalt. 2016. Corrective 3D Reconstruction of Lips from Monocular Video. ACM Trans. Graph. 35, 6 (2016), 219:1--219:11.

[25]

Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, and Fredric Pighin. 1998. Making faces. In ACM SIGGRAPH. 55--66.

Digital Library

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015). http://arxiv.org/abs/1502.01852

[27]

Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015a. Unconstrained Realtime Facial Performance Capture. In IEEE CVPR. 1675--1683.

[28]

Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015b. Unconstrained realtime facial performance capture. In Proc. Computer Vision and Pattern Recognition (CVPR). 1675--1683.

[29]

Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D Avatar Creation from Hand-held Video Input. ACM Trans. Graph. 34, 4 (2015), 45:1--45:14.

Digital Library

[30]

Vahid Kazemi and Josephine Sullivan. 2014. One Millisecond Face Alignment with an Ensemble of Regression Trees. In Proc. Computer Vision and Pattern Recognition (CVPR).

Digital Library

[31]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

[32]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. NIPS. 1097--1105.

Digital Library

[33]

Hao Li, Thibaut Weise, and Mark Pauly. 2010. Example-based facial rigging. In Acm transactions on graphics (tog), Vol. 29. ACM, 32.

Digital Library

[34]

Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. ACM Trans. Graph. 32, 4 (2013), 42:1--42:10.

Digital Library

[35]

Yilong Liu, Feng Xu, Jinxiang Chai, Xin Tong, Lijuan Wang, and Qiang Huo. 2015. Video-audio Driven Real-time Facial Animation. ACM Trans. Graph. 34, 6 (2015), 182:1--182:10.

Digital Library

[36]

Wesley Mattheyses and Werner Verhelst. 2015. Audiovisual speech synthesis: An overview of the state-of-the-art. Speech Communication 66 (2 2015), 182--217.

[37]

Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity Facial and Speech Animation for VR HMDs. ACM Trans. Graph. 35, 6 (2016), 221:1--221:14.

Digital Library

[38]

Photoscan. 2014. Agisoft. (2014). http://www.agisoft.com/

[39]

Fred Pighin and J. P. Lewis. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses (SIGGRAPH '06).

[40]

F. Pughin and J. P. Lewis. 2006. Performance-driven facial animation. In ACM SIGGRAPH 2006 Courses.

[41]

Shunsuke Saito, Tianye Li, and Hao Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. CoRR abs/1604.02647 (2016). http://arxiv.org/abs/1604.02647

[42]

Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable Model Fitting by Regularized Landmark Mean-Shift. IJCV 91, 2(2011), 200--215.

Digital Library

[43]

Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM Trans. Graph. 33, 6 (2014), 222:1--222:13.

Digital Library

[44]

Patrice Y. Simard, Dave Steinkraus, and John Platt. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Proc. ICDAR, Vol. 3. 958--962.

[45]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv: 1409.1556 (2014).

[46]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Ried-miller. 2014. Striving for Simplicity: The All Convolutional Net. arXiv preprint arXiv:1412.6806 (2014).

[47]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958.

Digital Library

[48]

Sarah L. Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic Units of Visual Speech. In Proc. SCA. 275--284.

[49]

Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688 (May 2016).

[50]

Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183.

Digital Library

[51]

J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. Computer Vision and Pattern Recognition (CVPR).

[52]

P. A. Tresadern, M. C. Ionita, and T F. Cootes. 2012. Real-Time Facial Feature Tracking on a Mobile Device. Int. J. Comput. Vision 96, 3 (2012), 280--289.

Digital Library

[53]

Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187 (Nov. 2012), 11 pages

Digital Library

[54]

Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. In ACM SIGGRAPH (SIGGRAPH '05). ACM, New York, NY, USA, 426--433.

Digital Library

[55]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE TIP 13, 4 (2004), 600--612.

Digital Library

[56]

Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime Performance-based Facial Animation. ACM Trans. Graph. 30, 4 (2011), 77:1--77:10.

Digital Library

[57]

Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009a. Face/Off: Live Facial Puppetry. In Proc. SCA. 7--16.

Digital Library

[58]

Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009b. Face/off: Live facial puppetry. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation. ACM, 7--16.

Digital Library

[59]

Yanlin Weng, Chen Cao, Qiming Hou, and Kun Zhou. 2014. Real-time facial animation on mobile devices. Graphical Models 76, 3 (2014), 172--179.

[60]

Lance Williams. 1990a. Performance-driven Facial Animation. SIGGRAPH Comput. Graph. 24, 4 (1990), 235--242.

Digital Library

[61]

Lance Williams. 1990b. Performance-driven facial animation. ACM SIGGRAPH 24, 4 (1990), 235--242.

Digital Library

[62]

Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An Anatomically-constrained Local Deformation Model for Monocular Face Capture. ACM Trans. Graph. 35, 4 (2016), 115:1--115:12.

Digital Library

[63]

Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004. Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23, 3 (2004), 548--558.

Digital Library

[64]

Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based Gaze Estimation in the Wild. In Proc. Computer Vision and Pattern Recognition (CVPR). 4511--4520.

Cited By

Baert KBharadwaj SCastan FMaujean BChristie MAbrevaya VBoukhayma A(2024)SPARK: Self-supervised Personalized Real-time Monocular Face CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687704(1-12)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687704
Ghosh SDhall AHayat MKnibbe JJi Q(2024)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: Jan-2024
https://doi.org/10.1109/TPAMI.2023.3321337
Clocchiatti AFumerò NSoccini A(2024)Character Animation Pipeline based on Latent Diffusion and Large Language Models2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)10.1109/AIxVR59861.2024.00067(398-405)Online publication date: 17-Jan-2024
https://doi.org/10.1109/AIxVR59861.2024.00067
Show More Cited By

Index Terms

Production-level facial performance capture using deep convolutional neural networks
1. Computing methodologies
  1. Computer graphics
    1. Animation
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
    2. Machine learning approaches
      1. Neural networks

Recommendations

Customizing blendshapes to capture facial details
Abstract
Blendshape technique is an effective tool in the computer facial animation. Every character requires its own unique blendshapes to cover numerous facial expressions in the Visual Effects industry. Despite outstanding advances in this area, ...
Interactive editing of performance-based facial animation
SA '19: SIGGRAPH Asia 2019 Technical Briefs

While performance-based facial animation efficiently produces realistic animation, it still needs additional editing after automatic solving and retargeting. We review why additional editing is required and present a set of interactive editing solutions ...
Facial retargeting with automatic range of motion alignment

While facial capturing focuses on accurate reconstruction of an actor's performance, facial animation retargeting has the goal to transfer the animation to another character, such that the semantic meaning of the animation remains. Because of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SCA '17: Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation

July 2017

212 pages

ISBN:9781450350914

DOI:10.1145/3099564

Conference Chairs:
Joseph Teran
UCLA
,
Changxi Zheng
Columbia University
,
Editor:
Stephen N. Spencer
University of Washington
,
Program Chairs:
Bernhard Thomaszewski
Disney Research Zurich
,
KangKang Yin
National University of Singapore

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SCA '17

Sponsor:

SIGGRAPH
EUROGRAPHICS

SCA '17: The ACM SIGGRAPH / Eurographics Symposium on Computer Animation

July 28 - 30, 2017

California, Los Angeles

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
1,021
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Baert KBharadwaj SCastan FMaujean BChristie MAbrevaya VBoukhayma A(2024)SPARK: Self-supervised Personalized Real-time Monocular Face CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687704(1-12)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687704
Ghosh SDhall AHayat MKnibbe JJi Q(2024)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: Jan-2024
https://doi.org/10.1109/TPAMI.2023.3321337
Clocchiatti AFumerò NSoccini A(2024)Character Animation Pipeline based on Latent Diffusion and Large Language Models2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)10.1109/AIxVR59861.2024.00067(398-405)Online publication date: 17-Jan-2024
https://doi.org/10.1109/AIxVR59861.2024.00067
Kabadayi BZielonka WBhatnagar BPons-Moll GThies J(2024)GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00058(882-892)Online publication date: 18-Mar-2024
https://doi.org/10.1109/3DV62453.2024.00058
Ji XLiao ZDong LTang YLi GMao M(2024)3D facial animation driven by speech-video dual-modal signalsComplex & Intelligent Systems10.1007/s40747-024-01481-510:5(5951-5964)Online publication date: 23-May-2024
https://doi.org/10.1007/s40747-024-01481-5
Coy AMohammed PSkerrit P(2024)Inclusive Deaf Education Enabled by Artificial Intelligence: The Path to a SolutionInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00419-9Online publication date: 24-Jul-2024
https://doi.org/10.1007/s40593-024-00419-9
Navarro IKneubuehler DVerhulsdonck TDu Bois EWelch WShang CSachs IMcguire MZordan VBhat K(2023)Audiovisual Inputs for Learning Robust, Real-time Facial Animation with Lip SyncProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624451(1-12)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3623264.3624451
Zhang LQiu QLin HZhang QShi CYang WShi YYang SXu LYu J(2023)DreamFace: Progressive Generation of Animatable 3D Faces under Text GuidanceACM Transactions on Graphics10.1145/359209442:4(1-16)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592094
Zhang LZhao ZCong XZhang QGu SGao YZheng RYang WXu LYu J(2023)HACK: Learning a Parametric Head and Neck Model for High-fidelity AnimationACM Transactions on Graphics10.1145/359209342:4(1-20)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592093
Bailey SRiviere JMikkelsen MO'Brien J(2023)Monocular Facial Performance Capture Via Deep Expression MatchingComputer Graphics Forum10.1111/cgf.1463941:8(243-254)Online publication date: 20-Mar-2023
https://doi.org/10.1111/cgf.14639
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten