Affective state recognition from hand gestures and facial expressions using Grassmann manifolds

533 Accesses
12 Citations
Explore all metrics

Abstract

The emotional state of a person is important to understand their affective state. Affective states are an important aspect of our being “human”. Therefore, for man-machine interaction to be natural and for machines to understand people, it is becoming necessary to understand a person’s emotional state. Non-verbal behavioral cues such as facial expression and hand gestures provide a firm basis for understanding the affective state of a person. In this paper, we proposed a novel, real-time framework that focuses on extracting the dynamic information from a videos for multiple modalities to recognize a person’s affective state. In the first step, we detect the face and hands of the person in the video and create the motion history images (MHI) of both the face and gesturing hands to encode the temporal dynamics of both these modalities. In the second step, features are extracted for both face and hand MHIs using deep residual network ResNet-101 and concatenated into one feature vector for recognition. We use these integrated features to create subspaces that lie on a Grassmann manifold. Then, we use Geodesic Flow Kernel (GFK) of this Grassmann manifold for domain adaptation and apply this GFK to adapt GGDA to robustly recognize a person’s affective state from multiple modalities. An accuracy of 93.4% on FABO (Gunes and Piccardi 19) dataset and 92.7% on our own dataset shows that integrated face and hand modalities perform better than state-of-the-art methods for affective state recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

Article 25 September 2023

OpenFace Tracker and GoogleNet: To Track and Detect Emotional States for People with Asperger Syndrome

CNN-Transformer based emotion classification from facial expressions and body gestures

Article 12 August 2023

References

Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychol Bull 111:256
Article Google Scholar
Ambady N, Bernieri F J, Richeson J A (2000) Toward a histology of social behavior: judgmental accuracy from thin slices of the behavioral stream. Adv Exp Social Psychol 32:201–271
Google Scholar
Barros P, Jirak D, Weber C, Wermter S (2015) Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Netw 72:140–151
Article Google Scholar
Barros P, Parisi G I, Weber C, Wermter S (2017) Emotion-modulated attention improves expression recognition: A deep learning model. Neurocomputing 253:104–114
Article Google Scholar
Bastanfard A, Karam H, Takahashi H, Nakajima M (2002) Simulation of human facial aging and skin senility. In: ITE technical report, the institute of image information and television engineers, vol 26(3), pp 1–7
Bastanfard A, Bastanfard O, Takahashi H, Nakajima M (2004) Toward anthropometrics simulation of face rejuvenation and skin cosmetic. In: Computer animation and virtual worlds 15, vol 3, pp 347–352
Bastanfard A, Takahashi H, Nakajima M (2004) Toward E-appearance of human face and hair by age, expression and rejuvenation. In: International conference on cyberworlds. IEEE, pp 306–311
Berthouze N, Valstar M, Williams A, Egede J, Olugbade T, Wang C, Meng H, Aung M, Lane N, Song S Emopain challenge 2020: Multimodal pain evaluation from facial and bodily expressions. arXiv:2001.07739
Cai H, Qu Z, Li Z, Zhang Y, Hu X, Hu B (2020) Feature-level fusion approaches based on multimodal eeg data for depression recognition. Inf Fusion 59:127–138
Article Google Scholar
Canziani A, Paszke A, Culurciello E An analysis of deep neural network models for practical applications. arXiv:1605.07678
Chen S, Tian Y, Liu Q, Metaxas D N (2013) Recognizing expressions from face and body gesture by temporal normalized motion and appearance features. Image Vis Comput 31(2):175–185
Article Google Scholar
Chen L, Wu M, Wanjuan S U, Hirota K (2018) Multi-convolution neural networks-based on deep learning model for emotion understanding. In: Proceedings of 37th international conference on chinese control conference (CCC), pp 9545–9549
Cohen I, Sebe N, Garg A, Chen L S, Huang T S (2003) Facial expression recognition from video sequences: Temporal and static modeling. Comput Vis Image Underst 91(1–2):160–187
Article Google Scholar
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceedings of 4th international conference on spoken language processing
Dibia V (2017) Real-time hand tracking using ssd on tensorflow, https://github.com/victordibia/handtracking
Gadanho S C (2003) Learning behavior-selection by emotions and cognition in a multi-goal robot task. J Mach Learn Res 4:385–412
MATH Google Scholar
Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 2066–2073
Gu Y, Mai X, Luo Y-J (2013) Do bodily expressions compete with facial expressions? Time course of integration of emotional signals from the face and the body. PLoS One 8:736–762
Google Scholar
Gunes H, Piccardi M (2006) A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: Proceedings of 18th international conference on pattern recognition, vol 1, pp 1148–1153
Gunes H, Piccardi M (2009) Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans Syst Man Cybern 39 (1):64–84
Article Google Scholar
Harandi M T, Sanderson C, Shirazi S, Lovell B C (2011) Graph embedding discriminant analysis on Grassmannian manifold for improved image set matching. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 2705–2712
Kapoor R W, Picard A (2005) Multimodal affect recognition in learning environments. In: Proceedings of 13th annual ACM international conference on multimedia, pp 677–682
Kim J, André E (2008) Emotion recognition based on physiological changes in music listening. IEEE Trans Pattern Anal Mach Intell 30(12):2067–2083
Article Google Scholar
Kim KH, Bang SW, Kim SR (2004) Emotion recognition system using short-term monitoring of physiological signals. Med Biol Eng Comput 42 (3):419–427
Article Google Scholar
Koelstra S, Muhl C, Soleymani M, Lee J S, Yazdani A, Ebrahimi T, Patras I (2012) Deap: A database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Article Google Scholar
Kret M E, Roelofs K, Stekelenburg J, De G B (2013) Salient cues from faces, bodies and scenes influence observers face expressions. Front Hum Neurosci 7:810–850
Article Google Scholar
Lisetti C L, Nasoz F (2004) Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J Appl Signal Process 11:1672–1687
Google Scholar
Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dyn 98(2):1447–1464
Article Google Scholar
Mohammadi E, Fatemizadeh M R, Mahoor M H (2014) Pca-based dictionary building for accurate facial expression recognition via sparse representation. J Vis Commun Image Represent 25(5):1082–1092
Article Google Scholar
Nguyen D, Nguyen K, Sridharan S, Dean D, Fookes C (2018) Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition. Comput Vis Image Underst 174:33–42
Pentland A (2008) Honest signals: How they shape our world, MIT Press
Picard R (1997) Affective computing. The MIT Press, Cambridge, Massachusetts
Google Scholar
Psaltis A, Kaza K, Stefanidis K, Thermos S, Apostolakis K C, Dimitropoulos K, Daras P (2016) Multimodal affective state recognition in serious games applications. In: Proceedings of international conference on imaging systems and techniques (IST), pp 435–439
Richmond V P, McCroskey J C, Hickson M (2008) Nonverbal behavior in interpersonal relations. Allyn & Bacon
Roy D, Pentland A (1996) Automatic spoken affect classification and analysis. In: Proceedings of 2nd international conference on automatic face and gesture recognition, pp 363–367
Scherer K R, Ellgring H (2007) Multimodal expression of emotion: Affect programs or componential appraisal patterns? Emotion 7(1):158
Article Google Scholar
Shan C, Gong S, McOwan P W (2007) Beyond facial expressions: learning human emotion from body gestures. In: Proceedings of international conference on british machine vision conference (BMVC), pp 1–10
Spexard T, Hanheide M, Sagerer G (2007) Human-oriented interaction with an anthropomorphic robot. IEEE Trans Robot 23:852–862
Article Google Scholar
Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2018) A novel weakly-supervised approach for RGB-D-based nuclear waste object detection. IEEE Sens J 19(9):3487–3500
Article Google Scholar
Sun B, Cao S, He J, Yu L (2018) Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw 105:36–51
Article Google Scholar
Tang Z, Yu H, Lu C, Liu P, Jin X (2019) Single-trial classification of different movements on one arm based on ERD/ERS and corticomuscular coherence. IEEE Access 7:128185–128197
Article Google Scholar
Tang Z C, Li C, Wu J F, Liu P C, Cheng S W (2019) Classification of EEG-based single-trial motor imagery tasks using a B-CSP method for BCI. Front Inf Technol Electron Eng 20(8):1087–1098
Article Google Scholar
Tzirakis P, Trigeorgis G, Nicolaou M A, Schuller B W, Zafeiriou S (2017) End-to-end multimodal emotion recognition using Deep Neural Networks. IEEE J Sel Top Signal Process 11(8):1301–1309
Article Google Scholar
Uddin M A, Joolee J B, Lee Y-K Depression level prediction using deep spatiotemporal features and 698 multilayer bi-ltsm. IEEE Transactions on Affective Computing (2020)
Vinciarelli A, Pantic M, Bourlard H (2009) Social signal proessing: Survey of an emerging domain. Image Vis Comput 27:1743–1759
Article Google Scholar
Viola P, Jones M J (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Yang Z, Narayanan S (2015) Modeling mutual influence of multimodal behavior in affective dyadic interactions. In: Proceedings of international conference on acoustics, speech and signal processing (ICASSP), pp 2234–2238
Yang Z, Gong B, Narayanan S (2017) Weighted geodesic flow kernel for interpersonal mutual influence modeling and emotion recognition in dyadic interactions. In: Proceedings of international conference on affective computing and intelligent interaction (ACII), pp 236–241

Download references

Author information

Authors and Affiliations

Delhi Technological University, New Delhi, India
Bindu Verma
School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India
Ayesha Choudhary

Authors

Bindu Verma
View author publications
You can also search for this author in PubMed Google Scholar
Ayesha Choudhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayesha Choudhary.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, B., Choudhary, A. Affective state recognition from hand gestures and facial expressions using Grassmann manifolds. Multimed Tools Appl 80, 14019–14040 (2021). https://doi.org/10.1007/s11042-020-10341-6

Download citation

Received: 24 May 2020
Revised: 30 September 2020
Accepted: 22 December 2020
Published: 20 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11042-020-10341-6

Affective state recognition from hand gestures and facial expressions using Grassmann manifolds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

OpenFace Tracker and GoogleNet: To Track and Detect Emotional States for People with Asperger Syndrome

CNN-Transformer based emotion classification from facial expressions and body gestures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Affective state recognition from hand gestures and facial expressions using Grassmann manifolds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A dynamic fusion of features from deep learning and the HOG-TOP algorithm for facial expression recognition

OpenFace Tracker and GoogleNet: To Track and Detect Emotional States for People with Asperger Syndrome

CNN-Transformer based emotion classification from facial expressions and body gestures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation