Abstract
Facial Expression Recognition (FER) is an effortless task for humans, and such non-verbal communication is intricately related to how we relate to others beyond the explicit content of our speech. Facial expressions can convey how we are feeling, as well as our intentions, and are thus a key point in multimodal social interactions. Recent computational advances, such as promising results from Convolutional Neural Networks (CNN), have drawn increasing attention to the potential of FER to enhance human–agent interaction (HAI) and human–robot interaction (HRI), but questions remain as to how “transferrable” the learned knowledge is from one task environment to another. In this paper, we explore how FER can be deployed in HAI cooperative game paradigms, where a human subject interacts with a virtual avatar in a goal-oriented environment where they must cooperate to survive. The primary question was whether transfer learning (TL) would offer an advantage for FER over pre-trained models based on similar (but the not exact same) task environment. The final results showed that TL was able to achieve significantly improved results (94.3% accuracy), without the need for an extensive task-specific corpus. We discuss how such approaches could be used to flexibly create more life-like robots and avatars, capable of fluid social interactions within cooperative multimodal environments.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are not publicly available due to the fact the data comprises video and audio recordings of identifiable human subjects during gameplay. However, extracted de-identified data may be made available from the corresponding author upon reasonable request.
References
Camerlink I, Coulange E, Farish M, Baxter EM, Turner SP (2018) Facial expression as a potential measure of both intent and emotion. Sci Rep 8:17602. https://doi.org/10.1038/s41598-018-35905-3
Key MR (2011) The relationship of verbal and nonverbal communication. De Gruyter Mouton, Berlin. https://doi.org/10.1515/9783110813098
Mehrabian A (2008) Communication without words. In: Communication theory, pp 193–200, Routledge. https://doi.org/10.4324/9781315080918-15
Jyoti J, Jesse H (2017) Continuous facial expression recognition for affective interaction with virtual avatar. IEEE Signal Processing Society, SigPort
Houshmand B, Khan N (2020) Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning. https://doi.org/10.1109/BigMM50055.2020.00020
Onyema EM, Shukla PK, Dalal S, Mathur MN, Zakariah M, Tiwari B (2021) Enhancement of patient facial recognition through deep learning algorithm: ConvNet. J Healthc Eng. https://doi.org/10.1155/2021/5196000
Bennett CC, Weiss B, Suh J, Yoon E, Jeong J, Chae Y (2022) Exploring data-driven components of socially intelligent AI through cooperative game paradigms. Multimodal Technol Interact 6(2):16. https://doi.org/10.3390/mti6020016
Carranza KR, Manalili J, Bugtai NT, Baldovino RG (2019) Expression tracking with OpenCV deep learning for a development of emotionally aware chatbots. In: 7th IEEE international conference on robot intelligence technology and applications (RiTA), pp 160–163. https://doi.org/10.1109/RITAPP.2019.8932852
Castillo JC, González ÁC, Alonso-Martín F, Fernández-Caballero A, Salichs MA (2018) Emotion detection and regulation from personal assistant robot in smart environment personal assistants. In: Personal assistants: emerging computational technologies, pp 179–195. Springer Cham. https://doi.org/10.1007/978-3-319-62530-0_10
Samadiani N, Huang G, Cai B, Luo W, Chi CH, Xiang Y, He J (2019) A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors 9(8):1863. https://doi.org/10.3390/s19081863
Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors (Basel, Switzerland). https://doi.org/10.3390/s18020401
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition (CVPR)1: I-I. https://doi.org/10.1109/CVPR.2001.990517.
Karnati M, Ayan S, Ondrej K, Anis Y (2021) FER-net: facial expression recognition using deep neural net. Neural Comput Appl 33:9125–9136. https://doi.org/10.1007/s00521-020-05676-y
Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2:420. https://doi.org/10.1007/s42979-021-00815-1
Wafa M, Wahida H (2020) Facial emotion recognition using deep learning: review and insights. Procedia Comput Sci 175:689–694. https://doi.org/10.1016/j.procs.2020.07.101
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE winter conference on applications of computer vision (WACV), 1–10
Lopes AT, Aguiar ED, Souza AF, Oliveira-Santos T (2017) Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognit 61:610–628. https://doi.org/10.1016/j.patcog.2016.07.026
Kim DH, Baddar WJ, Jang J, Ro Y (2019) Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223–236. https://doi.org/10.1109/TAFFC.2017.2695999
Singh S, Prasad SVAV (2018) Techniques and challenges of face recognition: a critical review. Procedia Comput Sci 143:536–543. https://doi.org/10.1016/j.procs.2018.10.427
Mohamad NO, Dras M, Hamey L, Richards D, Wan S, Paris C (2020) Automatic recognition of student engagement using deep learning and facial expression. In: Joint European conference on machine learning and knowledge discovery in databases, pp 273–289. https://doi.org/10.1007/978-3-030-46133-1_17
Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449. https://doi.org/10.1162/NECO_a_00990
Li B (2021) Facial expression recognition via transfer learning. EAI Endorsed Trans e-Learning 7(21):e4–e4. https://doi.org/10.4108/eai.8-4-2021.169180
Akhand MAH, Shuvendu R, Nazmul S, Kamal MAS, Shimamura T (2021) Facial emotion recognition using transfer learning in the deep CNN. Electronics 10(9):1036. https://doi.org/10.3390/electronics10091036
Bennett CC, Weiss B (2022) Purposeful failures as a form of culturally-appropriate intelligent disobedience during human–robot social interaction. In: Autonomous agents and multiagent systems best and visionary papers (AAMAS 2022), revised selected papers. Springer-Verlag, Berlin, Heidelberg, pp 84–90. https://doi.org/10.1007/978-3-031-20179-0_5
Marsh AA, Elfenbein HA, Ambady N (2003) Nonverbal “accents”: cultural differences in facial expressions of emotion. Psychol Sci 14(4):373–376. https://doi.org/10.1111/1467-9280.24461
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robot. Int J Soc Robot 1:71–81. https://doi.org/10.1007/s12369-008-0001-3
Ekman P, Friesen WV (2003) Unmasking the face a guide to recognizing emotions from facial clues. Malor Books, Los Altos
Fang X, Rychlowska M, Lange J (2022) Cross-cultural and inter-group research on emotion perception. J Cult Cogn Sci 6:1–7. https://doi.org/10.1007/s41809-022-00102-2
Soussignan R, Schaal B, Boulanger V, Garcia S, Jiang T (2015) Emotional communication in the context of joint attention for food stimuli: effects on attentional and affective processing. Biol Psychol 104:173–183. https://doi.org/10.1016/j.biopsycho.2014.12.006
Mojzisch A, Schilbach L, Helmert JR, Pannasch S, Velichkovsky BM, Vogeley K (2006) The effects of self-involvement on attention, arousal, and facial expression during social interaction with virtual others: a psychophysiological study. Soc Neurosci 1(3–4):184–195. https://doi.org/10.1080/17470910600985621
Blom PM, Methors S, Bakkes S, Spronck P (2019) Modeling and adjusting in-game difficulty based on facial expression analysis. Entertain Comput 31:100307. https://doi.org/10.1016/j.entcom.2019.100307
Mistry K, Jasekar J, Issac B, Zhang L (2018) Extended LBP based facial expression recognition system for adaptive AI agent behaviour. In: International joint conference on neural networks (IJCNN), pp 1–7
Serengil SI (2022) TensorFlow 101: introduction to deep learning for python within TensorFlow. https://www.github.com/serengil/tensorflow-101. Accessed 12 Dec 2022
Yan H (2016) Transfer subspace learning for cross-dataset facial expression recognition. Neurocomputing 208:165–173. https://doi.org/10.1016/j.neucom.2015.11.113
Dubey AK, Jain V (2020) Automatic facial recognition using VGG16 based transfer learning model. J Inf Optim Sci 41:1589–1596. https://doi.org/10.1080/02522667.2020.1809126
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556
Ji S, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI201259
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Jeong M, Ko BC (2018) Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12):4270. https://doi.org/10.3390/s18124270
Wu C-H, Lin J-C, Wei W-L (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans Signal Inf Process 3:e12. https://doi.org/10.1017/ATSIP201411
Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5216–5225. https://doi.org/10.1109/CVPR.2017.554
Xu Q, Yang Y, Tan Q, Zhang L (2017) Facial expressions in context: electrophysiological correlates of the emotional congruency of facial expressions and background scenes. Front Psychol 8:2175. https://doi.org/10.3389/fpsyg.2017.02175
Cha HS, Im CH (2022) Performance enhancement of facial electromyogram-based facial-expression recognition for social virtual reality applications using linear discriminant analysis adaptation. Virtual Real 26(1):385–398. https://doi.org/10.1007/s10055-021-00575-6
Citron FM, Gray MA, Critchley HD, Weekes BS, Ferstl EC (2014) Emotional valence and arousal affect reading in an interactive way: neuroimaging evidence for an approach-withdrawal framework. Neuropsychologia 56:79–89. https://doi.org/10.1016/jneuropsychologia201401002
Barrett LF, Russell JA (1999) The structure of current affect: controversies and emerging consensus. Curr Direct Psychol Sci 8(1):10–14. https://doi.org/10.1111/1467-8721.00003
Lang PJ, Bradley MM, Cuthbert BN (1997) Motivated attention: affect, activation, and action. Atten Orienting Sens Motivational Processes 97:135
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145. https://doi.org/10.1037/0033-295x.110.1.145
Tottenham N, Tanaka JW, Leon AC et al (2009) The NimStim set of facial expressions: judgments from untrained research participants. Psychiatry Res 68(3):242–249. https://doi.org/10.1016/jpsychres200805006
Biehl MC, Matsumoto D, Ekman P, Hearn V, Heider KG, Kudoh T, Ton V (1997) Matsumoto and Ekman’s Japanese and Caucasian Facial Expressions of Emotion (JACFEE): reliability data and cross-national differences. J Nonverbal Behav 21:3–21. https://doi.org/10.1023/A:1024902500935
Holzinger AT, Müller H (2021) Toward human–AI interfaces to support explainability and causability in medical AI. Computer 54(10):78–86. https://doi.org/10.1109/MC.2021.3092610
Thomaz A, Hoffman G, Cakmak M (2016) Computational human–robot interaction. Found Trends Robot 4:104–223. https://doi.org/10.1561/2300000049
Celiktutan O, Skordos S, Gunes H (2019) Multimodal human–human–robot interactions (MHHRI) dataset for studying personality and engagement. IEEE Trans Affect Comput 10(4):484–497. https://doi.org/10.1109/TAFFC.2017.2737019
Oh CS, Bailenson JN, Welch GF (2018) A systematic review of social presence: definition, antecedents, and implications. Front Robot AI 5:114. https://doi.org/10.3389/frobt.2018.00114
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Munoz-De-Escalona E, Cañas J (2017) Online measuring of available resources. In: First international symposium on human mental workload: models and applications. https://doi.org/10.21427/D7DK96
Funding
This work was supported through funding by a Grant from the National Research Foundation of Korea (NRF Grant# 2021R1G1A1003801).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Ethical approval
This study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Hanyang University (protocol #HYU2021-138) for studies involving humans. Informed consent was obtained from all subjects involved in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sánchez, P.C., Bennett, C.C. Facial expression recognition via transfer learning in cooperative game paradigms for enhanced social AI. J Multimodal User Interfaces 17, 187–201 (2023). https://doi.org/10.1007/s12193-023-00410-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-023-00410-z