Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Stabilized real-time face tracking via a learned dynamic rigidity prior

Published: 04 December 2018 Publication History

Abstract

Despite the popularity of real-time monocular face tracking systems in many successful applications, one overlooked problem with these systems is rigid instability. It occurs when the input facial motion can be explained by either head pose change or facial expression change, creating ambiguities that often lead to jittery and unstable rigid head poses under large expressions. Existing rigid stabilization methods either employ a heavy anatomically-motivated approach that are unsuitable for real-time applications, or utilize heuristic-based rules that can be problematic under certain expressions. We propose the first rigid stabilization method for real-time monocular face tracking using a dynamic rigidity prior learned from realistic datasets. The prior is defined on a region-based face model and provides dynamic region-based adaptivity for rigid pose optimization during real-time performance. We introduce an effective offline training scheme to learn the dynamic rigidity prior by optimizing the convergence of the rigid pose optimization to the ground-truth poses in the training data. Our real-time face tracking system is an optimization framework that alternates between rigid pose optimization and expression optimization. To ensure tracking accuracy, we combine both robust, drift-free facial landmarks and dense optical flow into the optimization objectives. We evaluate our system extensively against state-of-the-art monocular face tracking systems and achieve significant improvement in tracking accuracy on the high-quality face tracking benchmark. Our system can improve facial-performance-based applications such as facial animation retargeting and virtual face makeup with accurate expression and stable pose. We further validate the dynamic rigidity prior by comparing it against other variants on the tracking accuracy.

Supplementary Material

MP4 File (a233-cao.mp4)

References

[1]
Sameer Agarwal, Keir Mierle, and Others. 2016. Ceres Solver. http://ceres-solver.org. (2016).
[2]
Apple. 2017. Animoji. A new way to get into character. (2017). https://www.apple.com/iphone-x/#truedepth-camera
[3]
Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality Single-shot Capture of Facial Geometry. In ACM SIGGRAPH 2010 Papers (SIGGRAPH '10). ACM, New York, NY, USA, Article 40, 9 pages.
[4]
Thabo Beeler and Derek Bradley. 2014. Rigid Stabilization of Facial Expressions. ACM Trans. Graph. 33, 4, Article 44 (July 2014), 9 pages.
[5]
Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graphics (Proc. SIGGRAPH) 30, Article 75 (2011), 75:1--75:10 pages. Issue 4.
[6]
V. Blanz and T. Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of SIGGRAPH. 187--194.
[7]
Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM Trans. Graphics (Proc. SIGGRAPH) 32, 4, Article 40 (2013), 40:1--40:10 pages.
[8]
Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High Resolution Passive Facial Performance Capture. In ACM SIGGRAPH 2010 Papers (SIGGRAPH '10). ACM, New York, NY, USA, Article 41, 10 pages.
[9]
Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time High-fidelity Facial Performance Capture. ACM Trans. Graph. 34, 4, Article 46 (July 2015), 9 pages.
[10]
Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM Trans. Graphics (Proc. SIGGRAPH) 33, 4, Article 43 (2014), 43:1--43:10 pages.
[11]
Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D shape regression for real-time facial animation. ACM Trans. Graphics (Proc. SIGGRAPH) 32, 4, Article 41 (2013), 41:1--41:10 pages.
[12]
Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D Facial Expression Database for Visual Computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (March 2014), 413--425.
[13]
Jin-Xiang Chai, Jing Xiao, and Jessica Hodgins. 2003. Vision-based Control of 3D Facial Animation. In SCA.
[14]
Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, and Jinxiang Chai. 2013. Accurate and Robust 3D Facial Capture Using a Single RGBD Camera. In ICCV.
[15]
Yasutaka Furukawa and Jean Ponce. 2009. Dense 3D Motion Capture for Human Faces. In CVPR.
[16]
Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing Detailed Dynamic Face Geometry from Monocular Video. In ACM Trans. Graphics (Proc. SIGGRAPH Asia), Vol. 32. 158:1--158:10.
[17]
Pablo Garrido, Michael Zollhöfer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Pérez, and Christian Theobalt. 2016a. Reconstruction of Personalized 3D Face Rigs from Monocular Video. ACM Trans. Graph. 35, 3, Article 28 (May 2016), 15 pages.
[18]
Pablo Garrido, Michael Zollhöfer, Chenglei Wu, Derek Bradley, Patrick Pérez, Thabo Beeler, and Christian Theobalt. 2016b. Corrective 3D Reconstruction of Lips from Monocular Video. ACM Trans. Graph. 35, 6, Article 219 (Nov. 2016), 11 pages.
[19]
Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 30, 6, Article 129 (2011), 129:1--129:10 pages.
[20]
Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Trans. Graphics (Proc. SIGGRAPH) 30, 4, Article 74 (2011), 74:1--74:10 pages.
[21]
Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frédéric Pighin. 2003. Learning Controls for Blend Shape Based Realistic Facial Animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '03). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 187--192. http://dl.acm.org/citation.cfm?id=846276.846303
[22]
V. Kazemi and J. Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874.
[23]
Martin Klaudiny and Adrian Hilton. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In 3DIMPVT.
[24]
Till Kroeger, Radu Timofte, Dengxin Dai, editor="Leibe Bastian Van Gool, Luc", Jiri Matas, Nicu Sebe, and Max Welling. 2016. Fast Optical Flow Using Dense Inverse Search. Springer International Publishing, Cham, 471--488.
[25]
Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graphics (Proc. SIGGRAPH) 32, 4, Article 42 (2013), 42:1--42:10 pages.
[26]
Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, and Paul Debevec. 2007. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. In Eurographics Symposium on Rendering. 183--194.
[27]
Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse Localized Deformation Components. ACM Trans. Graph. 32, 6, Article 179 (Nov. 2013), 10 pages.
[28]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On Spectral Clustering: Analysis and an Algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS'01). MIT Press, Cambridge, MA, USA, 849--856. http://dl.acm.org/citation.cfm?id=2980539.2980649
[29]
Julien Peyras, Adrien Bartoli, Hugo Mercier, and Patrice Dalle. 2007. Segmented AAMs Improve Person-Independent Face Fitting. In In BMVC'07 - Proceedings of the 18th British Machine Vision Conference.
[30]
Taehyun Rhee, Youngkyoo Hwang, James Dokyoon Kim, and Changyeong Kim. 2011. Real-time Facial Animation from Live Video Tracking. In Proc. SCA. 215--224.
[31]
Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014a. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 33 (2014). Issue 6.
[32]
Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014b. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM Trans. Graph. 33, 6, Article 222 (Nov. 2014), 13 pages.
[33]
Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total Moving Face Reconstruction. In ECCV.
[34]
J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive Region-based Linear 3D Face Models. ACM Trans. Graph. 30, 4, Article 76 (July 2011), 10 pages.
[35]
Ayush Tewari, Michael Zollöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Theobalt Christian. 2017. MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. In The IEEE International Conference on Computer Vision (ICCV).
[36]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-Time Face Capture and Reenactment of RGB Videos. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2387--2395.
[37]
L. Valgaerts, C. Wu, A. Bruhn, H.-P. Seidel, and C. Theobalt. 2012. Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 31, 6, Article 187 (2012).
[38]
Congyi Wang, Fuhao Shi, Shihong Xia, and Jinxiang Chai. 2016. Realtime 3D Eye Gaze Animation Using a Single RGB Camera. ACM Trans. Graph. 35, 4, Article 118 (July 2016), 14 pages.
[39]
Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime Performance-Based Facial Animation. ACM Trans. Graphics (Proc. SIGGRAPH) 30, 4 (2011), 77:1--77:10.
[40]
Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009. Face/Off: live facial puppetry. In Proc. SCA. 7--16.
[41]
Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An Anatomically-constrained Local Deformation Model for Monocular Face Capture. ACM Trans. Graph. 35, 4, Article 115 (July 2016), 12 pages.
[42]
L. Zhang, N. Snavely, B. Curless, and S. M. Seitz. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM Trans. Graph. 23, 3 (2004), 548--558.
[43]
Qingshan Zhang, Z. Liu, Gaining Quo, D. Terzopoulos, and Heung-Yeung Shum. 2006. Geometry-driven photorealistic facial expression synthesis. IEEE Transactions on Visualization and Computer Graphics 12, 1 (Jan 2006), 48--60.

Cited By

View all
  • (2024)Learning to Stabilize FacesComputer Graphics Forum10.1111/cgf.1503843:2Online publication date: 24-Apr-2024
  • (2024)Appearance-Preserved Portrait-to-Anime Translation via Proxy-Guided Domain AdaptationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.322870730:7(3104-3120)Online publication date: 1-Jul-2024
  • (2024)Learning 3D Face Reconstruction From the Cycle-Consistency of Dynamic FacesIEEE Transactions on Multimedia10.1109/TMM.2023.332289526(3663-3675)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Stabilized real-time face tracking via a learned dynamic rigidity prior

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 37, Issue 6
    December 2018
    1401 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/3272127
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2018
    Published in TOG Volume 37, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. real-time monocular face tracking
    2. rigid stabilization for facial animation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning to Stabilize FacesComputer Graphics Forum10.1111/cgf.1503843:2Online publication date: 24-Apr-2024
    • (2024)Appearance-Preserved Portrait-to-Anime Translation via Proxy-Guided Domain AdaptationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.322870730:7(3104-3120)Online publication date: 1-Jul-2024
    • (2024)Learning 3D Face Reconstruction From the Cycle-Consistency of Dynamic FacesIEEE Transactions on Multimedia10.1109/TMM.2023.332289526(3663-3675)Online publication date: 2024
    • (2024)Weakly Supervised Exaggeration Transfer for Caricature Generation With Cross-Modal Knowledge DistillationIEEE Computer Graphics and Applications10.1109/MCG.2024.339012144:4(98-112)Online publication date: 17-Apr-2024
    • (2024)3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00123(1227-1237)Online publication date: 16-Jun-2024
    • (2024)Improved 3D human face reconstruction from 2D images using blended hard edgesNeural Computing and Applications10.1007/s00521-024-09868-836:24(14967-14987)Online publication date: 1-Aug-2024
    • (2023)MIPS-Fusion: Multi-Implicit-Submaps for Scalable and Robust Online Neural RGB-D ReconstructionACM Transactions on Graphics10.1145/361836342:6(1-16)Online publication date: 5-Dec-2023
    • (2023)Parsing-Conditioned Anime Translation: A New Dataset and MethodACM Transactions on Graphics10.1145/358500242:3(1-14)Online publication date: 10-Apr-2023
    • (2023)Exemplar-Based 3D Portrait StylizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311430829:2(1371-1383)Online publication date: 1-Feb-2023
    • (2023)Video-Based Stabilized 3D Face Alignment Using Temporal Multi-Discrimination2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP)10.1109/MMSP59012.2023.10337645(1-6)Online publication date: 27-Sep-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media