Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681455acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

Published: 28 October 2024 Publication History

Abstract

4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.

References

[1]
Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157--164.
[2]
James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3d morphable model learnt from 10,000 faces. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5543--5552.
[3]
Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, and Stefanos Zafeiriou. 2019. Neural 3d morphable models: Spiral convolutional networks for 3d shape representation learning and generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7213--7222.
[4]
Alan Brunton, Timo Bolkart, and Stefanie Wuhrer. 2014. Multilinear wavelets: A statistical shape space for human faces. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part I 13. Springer, 297--312.
[5]
Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on graphics (TOG) 33, 4 (2014), 1--10.
[6]
Fengju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, and Gerard Medioni. 2017. FacePoseNet: Making a Case for Landmark-Free Face Alignment. arXiv:1708.07517 [cs.CV]
[7]
Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, and Gerard Medioni. 2018. ExpNet: Landmark-Free, Deep, 3D Facial Expressions. arXiv:1802.00542 [cs.CV]
[8]
Shiyang Cheng, Irene Kotsia, Maja Pantic, and Stefanos Zafeiriou. 2018. 4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9]
Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, and Michael J Black. 2019. Capture, learning, and synthesis of 3D speaking styles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10101-- 10111.
[10]
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2020. Accurate 3D Face Reconstruction withWeakly-Supervised Learning: From Single Image to Image Set. arXiv:1903.08527 [cs.CV]
[11]
Xuanyi Dong, Yi Yang, Shih-EnWei, XinshuoWeng, Yaser Sheikh, and Shoou-I Yu. 2020. Supervision by registration and triangulation for landmark detection. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3681--3694.
[12]
Lijie Fan,Wenbing Huang, Chuang Gan, Junzhou Huang, and Boqing Gong. 2018. Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation. arXiv:1808.02992 [cs.CV]
[13]
Claudio Ferrari, Stefano Berretti, Pietro Pala, and Alberto Del Bimbo. 2021. A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3D faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6667--6682.
[14]
Claudio Ferrari, Stefano Berretti, Pietro Pala, and Alberto Del Bimbo. 2021. A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3D faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6667--6682.
[15]
Claudio Ferrari, Giuseppe Lisanti, Stefano Berretti, and Alberto Del Bimbo. 2015. Dictionary learning based 3D morphable model construction for face recognition with varying expression and pose. In 2015 International Conference on 3D Vision. IEEE, 509--517.
[16]
Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2019.00125
[17]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
[18]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG]
[19]
Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--12.
[20]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[21]
Xin Lu, Zhengda Lu, Yiqun Wang, and Jun Xiao. 2023. Landmark Guided 4D Facial Expression Generation. In SIGGRAPH Asia 2023 Posters. 1--2.
[22]
Marcel Lüthi, Thomas Gerig, Christoph Jud, and Thomas Vetter. 2017. Gaussian process morphable models. IEEE transactions on pattern analysis and machine intelligence 40, 8 (2017), 1860--1873.
[23]
Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM Transactions on Graphics (TOG) 32, 6 (2013), 1--10.
[24]
Federico Nocentini, Claudio Ferrari, and Stefano Berretti. 2023. Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation. arXiv:2306.01415 [cs.CV]
[25]
Naima Otberdout, Mohamed Daoudi, Anis Kacem, Lahoucine Ballihi, and Stefano Berretti. 2022. Dynamic Facial Expression Generation on Hilbert Hypersphere With Conditional Wasserstein Generative Adversarial Nets. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2 (2022), 848--863. https://doi.org/ 10.1109/TPAMI.2020.3002500
[26]
Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, and Alberto Del Bimbo. 2023. Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories. arXiv:2208.00050 [cs.CV]
[27]
Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, and Alberto Del Bimbo. 2022. Sparse to dense dynamic 3d facial expression generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20385--20394.
[28]
Rolandos Alexandros Potamias, Jiali Zheng, Stylianos Ploumpis, Giorgos Bouritsas, Evangelos Ververas, and Stefanos Zafeiriou. 2020. Learning to generate customized dynamic 3D facial expressions. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXIX 16. Springer, 278--294.
[29]
Filippo Principi, Stefano Berretti, Claudio Ferrari, Naima Otberdout, Mohamed Daoudi, and Alberto Del Bimbo. 2023. The florence 4d facial expression dataset. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1--6.
[30]
Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J Black. 2018. Generating 3D faces using convolutional mesh autoencoders. In Proceedings of the European conference on computer vision (ECCV). 704--720.
[31]
Kritaphat Songsri-in and Stefanos Zafeiriou. 2020. Face video generation from a single image and landmarks. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 69--76.
[32]
JunWan, Zhihui Lai, Jing Li, Jie Zhou, and Can Gao. 2021. Robust facial landmark detection by multiorder multiconstraint deep networks. IEEE Transactions on Neural Networks and Learning Systems 33, 5 (2021), 2181--2194.
[33]
Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2023. Quality Metric Guided Portrait Line Drawing Generation From Unpaired Training Data. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (Jan. 2023), 905--918. https://doi.org/10.1109/tpami.2022.3147570
[34]
Ran Yi, Zipeng Ye, Ruoyu Fan, Yezhi Shu, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. 2022. Animating portrait line drawings from a single face photo and a speech signal. In ACM SIGGRAPH 2022 Conference Proceedings. 1--8.
[35]
Libing Zeng, Lele Chen, Wentao Bao, Zhong Li, Yi Xu, Junsong Yuan, and Nima Khademi Kalantari. 2023. 3d-aware facial landmark detection via multi-view consistent training on synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12747--12758.
[36]
Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, and Hyewon Seo. 2023. 4D Facial Expression Diffusion Model. arXiv preprint arXiv:2303.16611 (2023).
[37]
Kaifeng Zou, Boyang Yu, and Hyewon Seo. 2023. 3D Facial Expression Generator Based on Transformer VAE. In 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2550--2554.

Index Terms

  1. FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 4d face
    2. expression generation
    3. lstm
    4. neutral landmark
    5. positional encoding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 13
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media