Production-level facial performance capture using deep convolutional neural networks

Published: 28 July 2017 Publication History


We present a real-time deep learning framework for video-based facial performance capture---the dense 3D tracking of an actor's face given a monocular video. Our pipeline begins with accurately capturing a subject using a high-end production facial capture pipeline based on multi-view stereo tracking and artist-enhanced animations. With 5--10 minutes of captured footage, we train a convolutional neural network to produce high-quality output, including self-occluded regions, from a monocular video sequence of that subject. Since this 3D facial performance capture is fully automated, our system can drastically reduce the amount of labor involved in the development of modern narrative-driven video games or films involving realistic digital doubles of actors and potentially hours of animated dialogue per character. We compare our results with several state-of-the-art monocular real-time facial capture techniques and demonstrate compelling animation inference in challenging areas such as eyes and lips.

Supplementary Material

ZIP File
Supplemental files.


  (2024)SPARK: Self-supervised Personalized Real-time Monocular Face CaptureSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687704(1-12)Online publication date: 3-Dec-2024
  (2024)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: Jan-2024
  (2024)Character Animation Pipeline based on Latent Diffusion and Large Language Models2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)10.1109/AIxVR59861.2024.00067(398-405)Online publication date: 17-Jan-2024
SCA '17: Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation
July 2017
Published: 28 July 2017


Author Tags

  deep learning
  facial animation


