Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3503161.3548421acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

High-Quality 3D Face Reconstruction with Affine Convolutional Networks

Published: 10 October 2022 Publication History

Abstract

Recent works based on convolutional encoder-decoder architecture and 3DMM parameterization have shown great potential for canonical view reconstruction from a single input image. Conventional CNN architectures benefit from exploiting the spatial correspondence between the input and output pixels. However, in 3D face reconstruction, the spatial misalignment between the input image (e.g. face) and the canonical/UV output makes the feature encoding-decoding process quite challenging. In this paper, to tackle this problem, we propose a new network architecture, namely the Affine Convolution Networks, which enables CNN based approaches to handle spatially non-corresponding input and output images and maintain high-fidelity quality output at the same time. In our method, an affine transformation matrix is learned from the affine convolution layer for each spatial location of the feature maps. In addition, we represent 3D human heads in UV space with multiple components, including diffuse maps for texture representation, position maps for geometry representation, and light maps for recovering more complex lighting conditions in the real world. All the components can be trained without any manual annotations. Our method is parametric-free and can generate high-quality UV maps at resolution of 512 x 512 pixels, while previous approaches normally generate 256 x 256 pixels or smaller. Our code will be released once the paper got accepted.

Supplementary Material

MP4 File (MM22-fp3159.mp4)
Presentation video

References

[1]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. 187--194.
[2]
James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3d morphable model learnt from 10,000 faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5543--5552.
[3]
Chen Cao, YanlinWeng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413--425.
[4]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764--773.
[5]
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set. In IEEE Computer Vision and Pattern Recognition Workshops.
[6]
Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an Animatable Detailed 3D Face Model from In-The-Wild Images. ACM Transactions on Graphics, (Proc. SIGGRAPH) 40, 8. https://doi.org/10.1145/3450626.3459936
[7]
Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV). 534--551.
[8]
Baris Gecer, Jiankang Deng, and Stefanos Zafeiriou. 2021. Ostec: One-shot texture completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7628--7638.
[9]
Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. Ganfit: Generative adversarial network fitting for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1155--1164.
[10]
Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. 2018. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8377--8386.
[11]
Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 75--82.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
[13]
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards Fast, Accurate and Stable 3D Dense Face Alignment. In Proceedings of the European Conference on Computer Vision (ECCV).
[14]
Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, P Koppen, William J Christmas, Matthias Ratsch, and Josef Kittler. 2016. A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
[15]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. Advances in Neural Information Processing Systems 28 (2015), 2017--2025.
[16]
Justin Johnson, Nikhila Ravi, Jeremy Reizenstein, David Novotny, Shubham Tulsiani, Christoph Lassner, and Steve Branson. 2020. Accelerating 3D deep learning with PyTorch3D. In SIGGRAPH Asia 2020 Courses. 1-1.
[17]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.
[18]
Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3D Mesh Renderer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760--769.
[20]
Gun-Hee Lee and Seong-Whan Lee. 2020. Uncertainty-aware mesh decoder for high fidelity 3d face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6100--6109.
[21]
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (TOG) 36, 6 (2017), 194.
[22]
Chen-Hsuan Lin and Simon Lucey. 2017. Inverse compositional spatial transformer networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2568--2576.
[23]
Jiangke Lin, Yi Yuan, Tianjia Shao, and Kun Zhou. 2020. Towards high-fidelity 3D face reconstruction from in-the-wild images using graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5891--5900.
[24]
Jiangke Lin, Yi Yuan, and Zhengxia Zou. 2021. MeInGame: Create a Game Character Face from a Single Portrait. In Proceedings of the AAAI Conference on Artificial Intelligence.
[25]
Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. 2019. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7708--7717.
[26]
Bruce D Lucas and Takeo Kanade. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conferences on Artificial Intelligence, Vol. 81. 674--679.
[27]
Huiwen Luo, Koki Nagano, Han-Wei Kung, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, and Hao Li. 2021. Normalized Avatar Synthesis Using Style-GAN and Perceptual Refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11662--11672.
[28]
Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance. Ieee, 296--301.
[29]
Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1259--1268.
[30]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[31]
Soubhik Sanyal, Timo Bolkart, Haiwen Feng, and Michael J Black. 2019. Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7763--7772.
[32]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[33]
Ayush Tewari, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt, et al. 2021. Learning Complete 3D Morphable Face Models from Images and Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3361--3371.
[34]
Ayush Tewari, Michael Zollhöfer, Pablo Garrido, Florian Bernard, Hyeongwoo Kim, Patrick Pérez, and Christian Theobalt. 2018. Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2549--2559.
[35]
Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deep convolutional face auto encoder for unsupervised monocular reconstruction. In Proceedings of the IEEE International Conference on Computer Vision. 1274--1283.
[36]
Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards high-fidelity nonlinear 3D face morphable model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1126--1135.
[37]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8798--8807.
[38]
Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, and Wei Liu. 2019. MVF-Net: Multi-View 3D Face Morphable Model Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 959--968.
[39]
Shangzhe Wu, Christian Rupprecht, and Andrea Vedaldi. 2020. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1--10.
[40]
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light CNN for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security 13, 11 (2018), 2884--2896.
[41]
Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. 2020. FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42]
Zhenyu Zhang, Yanhao Ge, Renwang Chen, Ying Tai, Yan Yan, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2021. Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14214--14224.
[43]
Yuxiang Zhou, Jiankang Deng, Irene Kotsia, and Stefanos Zafeiriou. 2019. Dense 3d face decoding over 2500fps: Joint texture & shape convolutional mesh decoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1097--1106.
[44]
Wenbin Zhu, Hsiang Tao Wu, Zeyu Chen, Noranart Vesdapunt, and Baoyuan Wang. 2020. Reda: reinforced differentiable attribute for 3D face reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4958--4967.
[45]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9308--9316.

Cited By

View all
  • (2024)CompGS: Efficient 3D Scene Representation via Compressed Gaussian SplattingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681468(2936-2944)Online publication date: 28-Oct-2024
  • (2024)Hands Up! Towards Machine Learning Based Virtual Reality Arm Generation2024 IEEE Gaming, Entertainment, and Media Conference (GEM)10.1109/GEM61861.2024.10585620(1-6)Online publication date: 5-Jun-2024
  • (2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. High-Quality 3D Face Reconstruction with Affine Convolutional Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. affine convolution layer
    2. differentiable rendering
    3. high-quality 3d face reconstruction
    4. representing in uv space

    Qualifiers

    • Research-article

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CompGS: Efficient 3D Scene Representation via Compressed Gaussian SplattingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681468(2936-2944)Online publication date: 28-Oct-2024
    • (2024)Hands Up! Towards Machine Learning Based Virtual Reality Arm Generation2024 IEEE Gaming, Entertainment, and Media Conference (GEM)10.1109/GEM61861.2024.10585620(1-6)Online publication date: 5-Jun-2024
    • (2024)3D Face Reconstruction Based on a Single Image: A ReviewIEEE Access10.1109/ACCESS.2024.338197512(59450-59473)Online publication date: 2024
    • (2024)UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified ModelComputer Vision – ECCV 202410.1007/978-3-031-72940-9_12(204-221)Online publication date: 17-Nov-2024
    • (2023)FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00043(362-371)Online publication date: Jun-2023
    • (2023)Deep learning applications in games: a survey from a data perspectiveApplied Intelligence10.1007/s10489-023-05094-253:24(31129-31164)Online publication date: 4-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media