research-article

High-Quality 3D Face Reconstruction with Affine Convolutional Networks

Authors:

Zhengxia ZouAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2495 - 2503

https://doi.org/10.1145/3503161.3548421

Published: 10 October 2022 Publication History

Abstract

Recent works based on convolutional encoder-decoder architecture and 3DMM parameterization have shown great potential for canonical view reconstruction from a single input image. Conventional CNN architectures benefit from exploiting the spatial correspondence between the input and output pixels. However, in 3D face reconstruction, the spatial misalignment between the input image (e.g. face) and the canonical/UV output makes the feature encoding-decoding process quite challenging. In this paper, to tackle this problem, we propose a new network architecture, namely the Affine Convolution Networks, which enables CNN based approaches to handle spatially non-corresponding input and output images and maintain high-fidelity quality output at the same time. In our method, an affine transformation matrix is learned from the affine convolution layer for each spatial location of the feature maps. In addition, we represent 3D human heads in UV space with multiple components, including diffuse maps for texture representation, position maps for geometry representation, and light maps for recovering more complex lighting conditions in the real world. All the components can be trained without any manual annotations. Our method is parametric-free and can generate high-quality UV maps at resolution of 512 x 512 pixels, while previous approaches normally generate 256 x 256 pixels or smaller. Our code will be released once the paper got accepted.

Supplementary Material

MP4 File (MM22-fp3159.mp4)

Presentation video

Download
10.86 MB

References

[1]

Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. 187--194.

Digital Library

[2]

James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3d morphable model learnt from 10,000 faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5543--5552.

[3]

Chen Cao, YanlinWeng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413--425.

[4]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764--773.

[5]

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set. In IEEE Computer Vision and Pattern Recognition Workshops.

[6]

Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an Animatable Detailed 3D Face Model from In-The-Wild Images. ACM Transactions on Graphics, (Proc. SIGGRAPH) 40, 8. https://doi.org/10.1145/3450626.3459936

[7]

Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV). 534--551.

Digital Library

[8]

Baris Gecer, Jiankang Deng, and Stefanos Zafeiriou. 2021. Ostec: One-shot texture completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7628--7638.

[9]

Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1155--1164.

[10]

Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. 2018. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8377--8386.

[11]

Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 75--82.

Digital Library

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

[13]

Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In Proceedings of the European Conference on Computer Vision (ECCV).

Digital Library

[14]

Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, P Koppen, William J Christmas, Matthias Ratsch, and Josef Kittler. 2016. A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.

[15]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. Advances in Neural Information Processing Systems 28 (2015), 2017--2025.

[16]

Justin Johnson, Nikhila Ravi, Jeremy Reizenstein, David Novotny, Shubham Tulsiani, Christoph Lassner, and Steve Branson. 2020. Accelerating 3D deep learning with PyTorch3D. In SIGGRAPH Asia 2020 Courses. 1-1.

[17]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.

[18]

Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3D Mesh Renderer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]

Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760--769.

[20]

Gun-Hee Lee and Seong-Whan Lee. 2020. Uncertainty-aware mesh decoder for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6100--6109.

[21]

Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (TOG) 36, 6 (2017), 194.

Digital Library

[22]

Chen-Hsuan Lin and Simon Lucey. 2017. Inverse compositional spatial transformer networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2568--2576.

[23]

Jiangke Lin, Yi Yuan, Tianjia Shao, and Kun Zhou. 2020. Towards high-fidelity 3D face reconstruction from in-the-wild images using graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5891--5900.

[24]

Jiangke Lin, Yi Yuan, and Zhengxia Zou. 2021. MeInGame: Create a Game Character Face from a Single Portrait. In Proceedings of the AAAI Conference on Artificial Intelligence.

[25]

Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. 2019. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7708--7717.

[26]

Bruce D Lucas and Takeo Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conferences on Artificial Intelligence, Vol. 81. 674--679.

[27]

Huiwen Luo, Koki Nagano, Han-Wei Kung, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, and Hao Li. 2021. Normalized Avatar Synthesis Using Style-GAN and Perceptual Refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11662--11672.

[28]

Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance. Ieee, 296--301.

Digital Library

[29]

Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1259--1268.

[30]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[31]

Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael J Black. 2019. Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7763--7772.

[32]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[33]

Ayush Tewari, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt, et al. 2021. Learning Complete 3D Morphable Face Models from Images and Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3361--3371.

[34]

Ayush Tewari, Michael Zollhöfer, Pablo Garrido, Florian Bernard, Hyeongwoo Kim, Patrick Pérez, and Christian Theobalt. 2018. Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2549--2559.

[35]

Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deep convolutional face auto encoder for unsupervised monocular reconstruction. In Proceedings of the IEEE International Conference on Computer Vision. 1274--1283.

[36]

Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards high-fidelity nonlinear 3D face morphable model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1126--1135.

[37]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8798--8807.

[38]

Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, and Wei Liu. 2019. MVF-Net: Multi-View 3D Face Morphable Model Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 959--968.

[39]

Shangzhe Wu, Christian Rupprecht, and Andrea Vedaldi. 2020. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1--10.

[40]

Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security 13, 11 (2018), 2884--2896.

[41]

Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. 2020. FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Zhenyu Zhang, Yanhao Ge, Renwang Chen, Ying Tai, Yan Yan, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2021. Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14214--14224.

[43]

Yuxiang Zhou, Jiankang Deng, Irene Kotsia, and Stefanos Zafeiriou. 2019. Dense 3d face decoding over 2500fps: Joint texture & shape convolutional mesh decoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1097--1106.

[44]

Wenbin Zhu, Hsiang Tao Wu, Zeyu Chen, Noranart Vesdapunt, and Baoyuan Wang. 2020. Reda: reinforced differentiable attribute for 3D face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4958--4967.

[45]

Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9308--9316.

Cited By

Liu XWu XZhang PWang SLi ZKwong SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)CompGS: Efficient 3D Scene Representation via Compressed Gaussian SplattingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681468(2936-2944)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681468
Martinek DDavid Pazour PMirk DHlavacs H(2024)Hands Up! Towards Machine Learning Based Virtual Reality Arm Generation2024 IEEE Gaming, Entertainment, and Media Conference (GEM)10.1109/GEM61861.2024.10585620(1-6)Online publication date: 5-Jun-2024
https://doi.org/10.1109/GEM61861.2024.10585620
Diao HJiang XFan YLi MWu H(2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381975
Show More Cited By

Index Terms

High-Quality 3D Face Reconstruction with Affine Convolutional Networks
1. Computing methodologies
  1. Computer graphics
    1. Shape modeling
      1. Mesh geometry models

Recommendations

NeILF: Neural Incident Light Field for Physically-based Material Estimation
Computer Vision – ECCV 2022
Abstract
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the framework, we represent scene lightings as the Neural Incident Light Field (NeILF) and material ...
Learning-detailed 3D face reconstruction based on convolutional neural networks from a single image
Abstract
The efficiency of convolutional neural networks (CNNs) facilitates 3D face reconstruction, which takes a single image as an input and demonstrates significant performance in generating a detailed face geometry. The dependence of the extensive ...
3D face reconstruction from a single non-frontal face image
SIGGRAPH '15: ACM SIGGRAPH 2015 Posters

A reconstruction of a human face shape from a single image is an important theme for criminal investigation such as recognition of suspected people from surveillance cameras with only a few frames. It is, however, still difficult to recover a face shape ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
245
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)5

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XWu XZhang PWang SLi ZKwong SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)CompGS: Efficient 3D Scene Representation via Compressed Gaussian SplattingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681468(2936-2944)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681468
Martinek DDavid Pazour PMirk DHlavacs H(2024)Hands Up! Towards Machine Learning Based Virtual Reality Arm Generation2024 IEEE Gaming, Entertainment, and Media Conference (GEM)10.1109/GEM61861.2024.10585620(1-6)Online publication date: 5-Jun-2024
https://doi.org/10.1109/GEM61861.2024.10585620
Diao HJiang XFan YLi MWu H(2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3381975
Fan XLi JLin ZXiao WYang L(2024)UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified ModelComputer Vision – ECCV 202410.1007/978-3-031-72940-9_12(204-221)Online publication date: 17-Nov-2024
https://doi.org/10.1007/978-3-031-72940-9_12
Bai HKang DZhang HPan JBao L(2023)FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00043(362-371)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00043
Hu ZDing YWu RLi LZhang RHu YQiu FZhang ZWang KZhao SZhang YJiang JXi YPu JZhang WWang SChen KZhou TChen JSong YLv TFan C(2023)Deep learning applications in games: a survey from a data perspectiveApplied Intelligence10.1007/s10489-023-05094-253:24(31129-31164)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1007/s10489-023-05094-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents