Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3099564.3099581acmconferencesArticle/Chapter ViewAbstractPublication PagesscaConference Proceedingsconference-collections
research-article

Production-level facial performance capture using deep convolutional neural networks

Published: 28 July 2017 Publication History

Abstract

We present a real-time deep learning framework for video-based facial performance capture---the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5--10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.

Supplementary Material

ZIP File (a10-laine.zip)
Supplemental files.

References

[1]
Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The Digital Emily project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses (SIGGRAPH '09). ACM, New York, NY, USA, Article 12, 15 pages.
[2]
Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality Passive Facial Performance Capture Using Anchor Frames. ACM Trans. Graph. 30, 4, Article 75 (2011), 10 pages
[3]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In Proc. ICML. 41--48.
[4]
Amit Bermano, Thabo Beeler, Yeara Kozlov, Derek Bradley, Bernd Bickel, and Markus Gross. 2015. Detailed Spatio-temporal Reconstruction of Eyelids. ACM Trans. Graph. 34, 4 (2015), 44:1--44:11.
[5]
Kiran S. Bhat, Rony Goldenthal, Yuting Ye, Ronald Mallet, and Michael Koperwas. 2013. High Fidelity Facial Animation Capture and Retargeting with Contours. In Proc. Symposium on Computer Animation. 7--14.
[6]
Bernd Bickel, Manuel Lang, Mario Botsch, Miguel A. Otaduy, and Markus Gross. 2008. Pose-space animation and transfer of facial details. In ACM SCA. 57--66.
[7]
Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proc. ACM SIGGRAPH. 187--194.
[8]
George Borshukov, Dan Piponi, Oystein Larsen, J. P. Lewis, and Christina Tempelaar-Lietz. 2005. Universal capture - image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH Courses (SIGGRAPH '05). ACM, New York, NY, USA, Article 16.
[9]
Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Trans. Graph. 32, 4 (2013), 40:1--40:10.
[10]
Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High resolution passive facial performance capture. ACM TOG 29, Article 41 (2010), 10 pages. Issue 4.
[11]
Matthew Brand. 1999. Voice Puppetry. In Proc. ACM SIGGRAPH. 21--28.
[12]
Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time High-fidelity Facial Performance Capture. ACM Trans. Graph. 34, 4 (2015), 46:1--46:9.
[13]
Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM Trans. Graph. 33, 4 (2014), 43:1--43:10.
[14]
Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D Shape Regression for Real-time Facial Animation. ACM Trans. Graph. 32, 4 (2013), 41:1--41:10.
[15]
Michael M. Cohen and Dominic W. Massaro. 1993. Modeling Coarticulation in Synthetic Visual Speech. In Models and Techniques in Computer Animation. 139--156.
[16]
Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active Appearance Models. IEEE TPAMI 23, 6 (2001), 681--685.
[17]
Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, et al. 2015. Lasagne: First release. (2015).
[18]
Dimensional Imaging. 2016. DI4D PRO System. http://www.di4d.com/systems/di4d-pro-system/. (2016).
[19]
Pif Edwards, Chris Landreth, Eugene Fiume, and Karan Singh. 2016. JALI: An Animator-centric Viseme Model for Expressive Lip Synchronization. ACM Trans. Graph. 35, 4 (2016), 127:1--127:11.
[20]
P. Ekman and W. Friesen. 1978. The Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto.
[21]
Yasutaka Furukawa and Jean Ponce. 2009. Dense 3D motion capture for human faces. In Proc. Computer Vision and Pattern Recognition (CVPR).
[22]
Graham Fyffe, Tim Hawkins, Chris Watts, Wan-Chun Ma, and Paul Debevec. 2011. Comprehensive facial performance capture. In Computer Graphics Forum, Vol. 30. 425--434.
[23]
Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Trans. Graph. 34, 1 (2014), 8:1--8:14.
[24]
Pablo Garrido, Michael Zollhöfer, Chenglei Wu, Derek Bradley, Patrick Pérez, Thabo Beeler, and Christian Theobalt. 2016. Corrective 3D Reconstruction of Lips from Monocular Video. ACM Trans. Graph. 35, 6 (2016), 219:1--219:11.
[25]
Brian Guenter, Cindy Grimm, Daniel Wood, Henrique Malvar, and Fredric Pighin. 1998. Making faces. In ACM SIGGRAPH. 55--66.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015). http://arxiv.org/abs/1502.01852
[27]
Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015a. Unconstrained Realtime Facial Performance Capture. In IEEE CVPR. 1675--1683.
[28]
Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015b. Unconstrained realtime facial performance capture. In Proc. Computer Vision and Pattern Recognition (CVPR). 1675--1683.
[29]
Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D Avatar Creation from Hand-held Video Input. ACM Trans. Graph. 34, 4 (2015), 45:1--45:14.
[30]
Vahid Kazemi and Josephine Sullivan. 2014. One Millisecond Face Alignment with an Ensemble of Regression Trees. In Proc. Computer Vision and Pattern Recognition (CVPR).
[31]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
[32]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. NIPS. 1097--1105.
[33]
Hao Li, Thibaut Weise, and Mark Pauly. 2010. Example-based facial rigging. In Acm transactions on graphics (tog), Vol. 29. ACM, 32.
[34]
Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. ACM Trans. Graph. 32, 4 (2013), 42:1--42:10.
[35]
Yilong Liu, Feng Xu, Jinxiang Chai, Xin Tong, Lijuan Wang, and Qiang Huo. 2015. Video-audio Driven Real-time Facial Animation. ACM Trans. Graph. 34, 6 (2015), 182:1--182:10.
[36]
Wesley Mattheyses and Werner Verhelst. 2015. Audiovisual speech synthesis: An overview of the state-of-the-art. Speech Communication 66 (2 2015), 182--217.
[37]
Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity Facial and Speech Animation for VR HMDs. ACM Trans. Graph. 35, 6 (2016), 221:1--221:14.
[38]
Photoscan. 2014. Agisoft. (2014). http://www.agisoft.com/
[39]
Fred Pighin and J. P. Lewis. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses (SIGGRAPH '06).
[40]
F. Pughin and J. P. Lewis. 2006. Performance-driven facial animation. In ACM SIGGRAPH 2006 Courses.
[41]
Shunsuke Saito, Tianye Li, and Hao Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. CoRR abs/1604.02647 (2016). http://arxiv.org/abs/1604.02647
[42]
Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable Model Fitting by Regularized Landmark Mean-Shift. IJCV 91, 2(2011), 200--215.
[43]
Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM Trans. Graph. 33, 6 (2014), 222:1--222:13.
[44]
Patrice Y. Simard, Dave Steinkraus, and John Platt. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Proc. ICDAR, Vol. 3. 958--962.
[45]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv: 1409.1556 (2014).
[46]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Ried-miller. 2014. Striving for Simplicity: The All Convolutional Net. arXiv preprint arXiv:1412.6806 (2014).
[47]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929--1958.
[48]
Sarah L. Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic Units of Visual Speech. In Proc. SCA. 275--284.
[49]
Theano Development Team. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688 (May 2016).
[50]
Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183.
[51]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. Computer Vision and Pattern Recognition (CVPR).
[52]
P. A. Tresadern, M. C. Ionita, and T F. Cootes. 2012. Real-Time Facial Feature Tracking on a Mobile Device. Int. J. Comput. Vision 96, 3 (2012), 280--289.
[53]
Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187 (Nov. 2012), 11 pages
[54]
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. In ACM SIGGRAPH (SIGGRAPH '05). ACM, New York, NY, USA, 426--433.
[55]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE TIP 13, 4 (2004), 600--612.
[56]
Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime Performance-based Facial Animation. ACM Trans. Graph. 30, 4 (2011), 77:1--77:10.
[57]
Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009a. Face/Off: Live Facial Puppetry. In Proc. SCA. 7--16.
[58]
Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009b. Face/off: Live facial puppetry. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation. ACM, 7--16.
[59]
Yanlin Weng, Chen Cao, Qiming Hou, and Kun Zhou. 2014. Real-time facial animation on mobile devices. Graphical Models 76, 3 (2014), 172--179.
[60]
Lance Williams. 1990a. Performance-driven Facial Animation. SIGGRAPH Comput. Graph. 24, 4 (1990), 235--242.
[61]
Lance Williams. 1990b. Performance-driven facial animation. ACM SIGGRAPH 24, 4 (1990), 235--242.
[62]
Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An Anatomically-constrained Local Deformation Model for Monocular Face Capture. ACM Trans. Graph. 35, 4 (2016), 115:1--115:12.
[63]
Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004. Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23, 3 (2004), 548--558.
[64]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based Gaze Estimation in the Wild. In Proc. Computer Vision and Pattern Recognition (CVPR). 4511--4520.

Cited By

View all
  • (2024)SPARK: Self-supervised Personalized Real-time Monocular Face CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687704(1-12)Online publication date: 3-Dec-2024
  • (2024)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: Jan-2024
  • (2024)Character Animation Pipeline based on Latent Diffusion and Large Language Models2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)10.1109/AIxVR59861.2024.00067(398-405)Online publication date: 17-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SCA '17: Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation
July 2017
212 pages
ISBN:9781450350914
DOI:10.1145/3099564
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. facial animation

Qualifiers

  • Research-article

Conference

SCA '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 183 of 487 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SPARK: Self-supervised Personalized Real-time Monocular Face CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687704(1-12)Online publication date: 3-Dec-2024
  • (2024)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: Jan-2024
  • (2024)Character Animation Pipeline based on Latent Diffusion and Large Language Models2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)10.1109/AIxVR59861.2024.00067(398-405)Online publication date: 17-Jan-2024
  • (2024)GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar2024 International Conference on 3D Vision (3DV)10.1109/3DV62453.2024.00058(882-892)Online publication date: 18-Mar-2024
  • (2024)3D facial animation driven by speech-video dual-modal signalsComplex & Intelligent Systems10.1007/s40747-024-01481-510:5(5951-5964)Online publication date: 23-May-2024
  • (2024)Inclusive Deaf Education Enabled by Artificial Intelligence: The Path to a SolutionInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00419-9Online publication date: 24-Jul-2024
  • (2023)Audiovisual Inputs for Learning Robust, Real-time Facial Animation with Lip SyncProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624451(1-12)Online publication date: 15-Nov-2023
  • (2023)DreamFace: Progressive Generation of Animatable 3D Faces under Text GuidanceACM Transactions on Graphics10.1145/359209442:4(1-16)Online publication date: 26-Jul-2023
  • (2023)HACK: Learning a Parametric Head and Neck Model for High-fidelity AnimationACM Transactions on Graphics10.1145/359209342:4(1-20)Online publication date: 26-Jul-2023
  • (2023)Monocular Facial Performance Capture Via Deep Expression MatchingComputer Graphics Forum10.1111/cgf.1463941:8(243-254)Online publication date: 20-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media