research-article

Open access

FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

Authors:

Chuanqing Zhuang,

Jun XiaoAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 10882 - 10890

https://doi.org/10.1145/3664647.3681455

Published: 28 October 2024 Publication History

Abstract

4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.

References

[1]

Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157--164.

Digital Library

[2]

James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3d morphable model learnt from 10,000 faces. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5543--5552.

[3]

Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, and Stefanos Zafeiriou. 2019. Neural 3d morphable models: Spiral convolutional networks for 3d shape representation learning and generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7213--7222.

[4]

Alan Brunton, Timo Bolkart, and Stefanie Wuhrer. 2014. Multilinear wavelets: A statistical shape space for human faces. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part I 13. Springer, 297--312.

[5]

Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on graphics (TOG) 33, 4 (2014), 1--10.

[6]

Fengju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, and Gerard Medioni. 2017. FacePoseNet: Making a Case for Landmark-Free Face Alignment. arXiv:1708.07517 [cs.CV]

[7]

Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, and Gerard Medioni. 2018. ExpNet: Landmark-Free, Deep, 3D Facial Expressions. arXiv:1802.00542 [cs.CV]

[8]

Shiyang Cheng, Irene Kotsia, Maja Pantic, and Stefanos Zafeiriou. 2018. 4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]

Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, and Michael J Black. 2019. Capture, learning, and synthesis of 3D speaking styles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10101-- 10111.

[10]

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2020. Accurate 3D Face Reconstruction withWeakly-Supervised Learning: From Single Image to Image Set. arXiv:1903.08527 [cs.CV]

[11]

Xuanyi Dong, Yi Yang, Shih-EnWei, XinshuoWeng, Yaser Sheikh, and Shoou-I Yu. 2020. Supervision by registration and triangulation for landmark detection. IEEE transactions on pattern analysis and machine intelligence 43, 10 (2020), 3681--3694.

[12]

Lijie Fan,Wenbing Huang, Chuang Gan, Junzhou Huang, and Boqing Gong. 2018. Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation. arXiv:1808.02992 [cs.CV]

[13]

Claudio Ferrari, Stefano Berretti, Pietro Pala, and Alberto Del Bimbo. 2021. A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3D faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6667--6682.

Digital Library

[14]

Claudio Ferrari, Stefano Berretti, Pietro Pala, and Alberto Del Bimbo. 2021. A sparse and locally coherent morphable face model for dense semantic correspondence across heterogeneous 3D faces. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2021), 6667--6682.

Digital Library

[15]

Claudio Ferrari, Giuseppe Lisanti, Stefano Berretti, and Alberto Del Bimbo. 2015. Dictionary learning based 3D morphable model construction for face recognition with varying expression and pose. In 2015 International Conference on 3D Vision. IEEE, 509--517.

Digital Library

[16]

Baris Gecer, Stylianos Ploumpis, Irene Kotsia, and Stefanos Zafeiriou. 2019. GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2019.00125

[17]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).

[18]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG]

[19]

Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--12.

Digital Library

[20]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[21]

Xin Lu, Zhengda Lu, Yiqun Wang, and Jun Xiao. 2023. Landmark Guided 4D Facial Expression Generation. In SIGGRAPH Asia 2023 Posters. 1--2.

[22]

Marcel Lüthi, Thomas Gerig, Christoph Jud, and Thomas Vetter. 2017. Gaussian process morphable models. IEEE transactions on pattern analysis and machine intelligence 40, 8 (2017), 1860--1873.

[23]

Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM Transactions on Graphics (TOG) 32, 6 (2013), 1--10.

Digital Library

[24]

Federico Nocentini, Claudio Ferrari, and Stefano Berretti. 2023. Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation. arXiv:2306.01415 [cs.CV]

[25]

Naima Otberdout, Mohamed Daoudi, Anis Kacem, Lahoucine Ballihi, and Stefano Berretti. 2022. Dynamic Facial Expression Generation on Hilbert Hypersphere With Conditional Wasserstein Generative Adversarial Nets. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2 (2022), 848--863. https://doi.org/ 10.1109/TPAMI.2020.3002500

Digital Library

[26]

Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, and Alberto Del Bimbo. 2023. Generating Multiple 4D Expression Transitions by Learning Face Landmark Trajectories. arXiv:2208.00050 [cs.CV]

[27]

Naima Otberdout, Claudio Ferrari, Mohamed Daoudi, Stefano Berretti, and Alberto Del Bimbo. 2022. Sparse to dense dynamic 3d facial expression generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20385--20394.

[28]

Rolandos Alexandros Potamias, Jiali Zheng, Stylianos Ploumpis, Giorgos Bouritsas, Evangelos Ververas, and Stefanos Zafeiriou. 2020. Learning to generate customized dynamic 3D facial expressions. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXIX 16. Springer, 278--294.

[29]

Filippo Principi, Stefano Berretti, Claudio Ferrari, Naima Otberdout, Mohamed Daoudi, and Alberto Del Bimbo. 2023. The florence 4d facial expression dataset. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1--6.

Digital Library

[30]

Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J Black. 2018. Generating 3D faces using convolutional mesh autoencoders. In Proceedings of the European conference on computer vision (ECCV). 704--720.

Digital Library

[31]

Kritaphat Songsri-in and Stefanos Zafeiriou. 2020. Face video generation from a single image and landmarks. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 69--76.

[32]

JunWan, Zhihui Lai, Jing Li, Jie Zhou, and Can Gao. 2021. Robust facial landmark detection by multiorder multiconstraint deep networks. IEEE Transactions on Neural Networks and Learning Systems 33, 5 (2021), 2181--2194.

[33]

Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2023. Quality Metric Guided Portrait Line Drawing Generation From Unpaired Training Data. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (Jan. 2023), 905--918. https://doi.org/10.1109/tpami.2022.3147570

[34]

Ran Yi, Zipeng Ye, Ruoyu Fan, Yezhi Shu, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. 2022. Animating portrait line drawings from a single face photo and a speech signal. In ACM SIGGRAPH 2022 Conference Proceedings. 1--8.

Digital Library

[35]

Libing Zeng, Lele Chen, Wentao Bao, Zhong Li, Yi Xu, Junsong Yuan, and Nima Khademi Kalantari. 2023. 3d-aware facial landmark detection via multi-view consistent training on synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12747--12758.

[36]

Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, and Hyewon Seo. 2023. 4D Facial Expression Diffusion Model. arXiv preprint arXiv:2303.16611 (2023).

[37]

Kaifeng Zou, Boyang Yu, and Hyewon Seo. 2023. 3D Facial Expression Generator Based on Transformer VAE. In 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2550--2554.

Index Terms

FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

Index terms have been assigned to the content through auto-classification.

Recommendations

Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
3D/4D facial expression analysis: An advanced annotated face model approach

Facial expression analysis has interested many researchers in the past decade due to its potential applications in various fields such as human-computer interaction, psychological studies, and facial animation. Three-dimensional facial data has been ...
Facial expression transfer method based on frequency analysis

We propose a novel expression transfer method based on an analysis of the frequency of multi-expression facial images. We locate the facial features automatically and describe the shape deformations between a neutral expression and non-neutral ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Beijing Natural Science Foundation
State Key Laboratory of Robotics and Systems(HIT)
Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems
China Postdoctoral Science Foundation
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
13
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)13

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents