Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581783.3612832acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge

Published: 27 October 2023 Publication History

Abstract

The Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual behaviour analysis and behaviour generation (a.k.a generative AI) communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) the novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline_react2023.

References

[1]
Nalini Ambady and Robert Rosenthal. 1992. Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological bulletin 111, 2 (1992), 256.
[2]
Nikos Athanasiou, Mathis Petrovich, Michael J Black, and Gül Varol. 2022. TEACH: Temporal Action Composition for 3D Humans. In International Conference on 3D Vision 2022.
[3]
German Barquero, Sergio Escalera, and Cristina Palmero. 2022. BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction. arXiv preprint arXiv:2211.14304 (2022).
[4]
German Barquero, Johnny Núnez, Sergio Escalera, Zhen Xu,Wei-Wei Tu, Isabelle Guyon, and Cristina Palmero. 2022. Didn't see that coming: a survey on nonverbal social human behavior forecasting. In Understanding Social Behavior in Dyadic and Small Group Interactions. PMLR, 139--178.
[5]
German Barquero, Johnny Núñez, Zhen Xu, Sergio Escalera,Wei-Wei Tu, Isabelle Guyon, and Cristina Palmero. 2022. Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios. In Understanding Social Behavior in Dyadic and Small Group Interactions (Proceedings of Machine Learning Research, Vol. 173), Cristina Palmero, Julio C. S. Jacques Junior, Albert Clapés, Isabelle Guyon, Wei-Wei Tu, Thomas B. Moeslund, and Sergio Escalera (Eds.). PMLR, 107--138.
[6]
Angelo Cafaro, Johannes Wagner, Tobias Baur, Soumia Dermouche, Mercedes Torres Torres, Catherine Pelachaud, Elisabeth André, and Michel Valstar. 2017. The NoXi database: multimodal recordings of mediated novice-expert interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 350--359.
[7]
Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pretraining for natural language understanding and generation. Advances in neural information processing systems 32 (2019).
[8]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia. 1459--1462.
[9]
Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, and Taku Komura. 2022. FaceFormer: Speech-Driven 3D Facial Animation with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18770--18780.
[10]
Yuchi Huang and Saad M Khan. 2017. Dyadgan: Generating facial expressions in dyadic interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 11--18.
[11]
Yuchi Huang and Saad M Khan. 2018. Generating Photorealistic Facial Expressions in Dyadic Interactions. In BMVC. 201.
[12]
Cheng Luo, Siyang Song, Weicheng Xie, Linlin Shen, and Hatice Gunes. 2022. Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 1239--1246.
[13]
Cheng Luo, Siyang Song, Weicheng Xie, Micol Spitale, Linlin Shen, and Hatice Gunes. 2023. ReactFace: Multiple Appropriate Facial Reaction Generation in Dyadic Interactions. arXiv preprint arXiv:2305.15748 (2023).
[14]
Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, and Shiry Ginosar. 2022. Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20395--20405.
[15]
Cristina Palmero, German Barquero, Julio C. S. Jacques Junior, Albert Clapés, Johnny Núñez, David Curto, Sorina Smeureanu, Javier Selva, Zejian Zhang, David Saeteros, David Gallardo-Pujol, Georgina Guilera, David Leiva, Feng Han, Xiaoxue Feng, Jennifer He, Wei-Wei Tu, Thomas B. Moeslund, Isabelle Guyon and Sergio Escalera. 2022. ChaLearn LAP Challenges on Self-Reported Personality Recognition and Non-Verbal Behavior Forecasting During Social Dyadic Interactions: Dataset, Design, and Results. In Understanding Social Behavior in Dyadic and Small Group Interactions (Proceedings of Machine Learning Research, Vol. 173), Cristina Palmero, Julio C. S. Jacques Junior, Albert Clapés, Isabelle Guyon, Wei-Wei Tu, Thomas B. Moeslund, and Sergio Escalera (Eds.). 4--52.
[16]
Cristina Palmero, Javier Selva, Sorina Smeureanu, Julio Junior, CS Jacques, Albert Clapés, Alexa Moseguí, Zejian Zhang, David Gallardo, Georgina Guilera, et al. 2021. Context-aware personality inference in dyadic scenarios: Introducing the udiva dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1--12.
[17]
Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, and Supasorn Suwajanakorn. 2022. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10619--10629.
[18]
Ofir Press, Noah A Smith, and Mike Lewis. 2021. Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409 (2021).
[19]
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, 1--8.
[20]
Zilong Shao, Siyang Song, Shashank Jaiswal, Linlin Shen, Michel Valstar, and Hatice Gunes. 2021. Personality recognition by modelling person-specific cognitive processes using graph representation. In proceedings of the 29th ACM international conference on multimedia. 357--366.
[21]
Siyang Song, Zilong Shao, Shashank Jaiswal, Linlin Shen, Michel Valstar, and Hatice Gunes. 2022. Learning Person-specific Cognition from Facial Reactions for Automatic Personality Recognition. IEEE Transactions on Affective Computing (2022).
[22]
Siyang Song, Yuxin Song, Cheng Luo, Zhiyuan Song, Selim Kuzucu, Xi Jia, Zhijiang Guo, Weicheng Xie, Linlin Shen, and Hatice Gunes. 2022. GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multidimensional Edge Features. arXiv preprint arXiv:2211.12482 (2022).
[23]
Siyang Song, Micol Spitale, Cheng Luo, German Barquero, Cristina Palmero, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, et al. 2023. REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge. arXiv preprint arXiv:2306.06583 (2023).
[24]
Siyang Song, Micol Spitale, Yiming Luo, Batuhan Bal, and Hatice Gunes. 2023. Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How? arXiv e-prints (2023), arXiv-2302.
[25]
Antoine Toisoul, Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, and Maja Pantic. 2021. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nature Machine Intelligence 3, 1 (2021), 42--50.
[26]
Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang Ma, Liang Li, and Yebin Liu. 2022. Faceverse: a fine-grained and detail-controllable 3d face morphable model from a hybrid dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20333--20342.
[27]
Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, and Siyang Song. 2023. Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation. arXiv preprint arXiv:2305.15270 (2023).
[28]
Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z Yang, et al. 2022. Torchaudio: Building blocks for audio and speech processing. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6982--6986.

Cited By

View all
  • (2024)Vector Quantized Diffusion Models for Multiple Appropriate Reactions Generation2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581978(1-5)Online publication date: 27-May-2024
  • (2024)Finite Scalar Quantization as Facial Tokenizer for Dyadic Reaction Generation2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581929(1-5)Online publication date: 27-May-2024
  • (2024)Avatar Reaction to Multimodal Human BehaviorImage Analysis and Processing - ICIAP 2023 Workshops10.1007/978-3-031-51023-6_41(494-505)Online publication date: 24-Jan-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Check for updates

Author Tags

  1. challenge event
  2. facial reaction generation
  3. multi-modal behaviour

Qualifiers

  • Research-article

Funding Sources

  • ICREA
  • EPSRC
  • Spanish project

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)448
  • Downloads (Last 6 weeks)34
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Vector Quantized Diffusion Models for Multiple Appropriate Reactions Generation2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581978(1-5)Online publication date: 27-May-2024
  • (2024)Finite Scalar Quantization as Facial Tokenizer for Dyadic Reaction Generation2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581929(1-5)Online publication date: 27-May-2024
  • (2024)Avatar Reaction to Multimodal Human BehaviorImage Analysis and Processing - ICIAP 2023 Workshops10.1007/978-3-031-51023-6_41(494-505)Online publication date: 24-Jan-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media