survey

A Review and Meta-Analysis of Multimodal Affect Detection Systems

Authors:

Sidney K. D'mello,

Jacqueline KoryAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 47, Issue 3

Article No.: 43, Pages 1 - 36

https://doi.org/10.1145/2682899

Published: 17 February 2015 Publication History

Abstract

Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature- (38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R² of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.

References

[1]

S. Afzal and P. Robinson. 2011. Natural affect data: Collection and annotation. In New Perspectives on Affect and Learning Technologies, R. Calvo and S. D'Mello (Eds.) Springer, New York, NY, 44--70.

[2]

J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and O. John. 2008. Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int. J. Hum. Comput. Stud. 66, 303--317.

Digital Library

[3]

T. Baltrušaitis, N. Banda, and P. Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of the International Conference on Multimedia and Expo (Workshop on Affective Analysis in Multimedia).

[4]

N. Banda and P. Robinson. 2011. Noise analysis in audio-visual emotion recognition. In Proceedings of the 11th International Conference on Multimodal Interaction (ICMI).

[5]

L. Barrett. 2006. Are emotions natural kinds&quest; Perspect. Psychol. Sci. 1, 28--58.

[6]

L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. 2007. The experience of emotion. Ann. Rev. Psychol. 58, 373--403.

[7]

M. Borenstein, L. V. Hedges, J. P. T. Higgins, and H. R. Rothstein. 2009. Introduction to Meta-Analysis. John Wiley & Sons, Inc., Hoboken, NJ.

[8]

S. Brave and C. Nass. 2002. Emotion in human-computer interaction. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears (Eds.). Erlbaum Associates, Inc., Hillsdale, NJ, 81--96.

Digital Library

[9]

C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Mutlmodal Interfaces (ICMI'04), R. Sharma, T. M. P. Darrell Harper, G. Lazzari and M. Turk (Eds.). ACM, State College, PA, 205--211.

Digital Library

[10]

R. Calvo, S. K. D'Mello, J. Gratch, and A. Kappas. 2014. The Oxford Handbook of Affective Computing. Oxford University Press, New York, NY.

[11]

R. A. Calvo and S. K. D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1 (2007), 18--37.

Digital Library

[12]

G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Paouzaiou, and K. Karpouzis. 2006. Modeling naturalistic affective states via facial and vocal expression recognition. In International Conference on Multimidal Interfaces. ACM, New York, NY, 146--154.

Digital Library

[13]

G. Castellano, L. Kessous, and G. Caridakis. 2008. Emotion recognition through multiple modalities: Face, body gesture, speech. In Affect and Emotion in Human-Computer Interaction, C. Peter and R. Beale (Eds.). Lecture Notes in Computer Science, Vol. 4868. Springer, Berlin, 92--103.

Digital Library

[14]

G. Castellano, A. Pereira, I. Leite, A. Paiva, and P. McOwan. 2009. Detecting user engagement with a robot companion using task and social interaction-based features. In Proceedings of the 2009 International Conference on Multimodal interfaces. ACM, New York, NY, 119--126.

Digital Library

[15]

G. Chanel, C. Rebetez, M. Bétrancourt, and T. Pun. 2011. Emotion assessment from physiological signals for adaptation of game difficulty. IEEE Trans. Syst., Man Cybern. Part A Syst. Humans 41, 1052--1063.

Digital Library

[16]

C.-Y. Chen, Y.-K. Huang, and P. Cook. 2005. Visual/Acoustic emotion recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, Washington, DC, 1468--1471.

[17]

G. Chetty and M. Wagner. 2008. A multilevel fusion approach for audiovisual emotion recognition. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 115--120.

[18]

Z.-J. Chuang and C.-H. Wu. 2004. Multi-modal emotion recognition from speech and text. Int. J. Comput. Ling. Chin. Lang. Process. 9, 1--18.

[19]

J. A. Coan. 2010. Emergent ghosts of the emotion machine. Emotion Rev. 2, 274--285.

[20]

J. Cohen. 1992. A power primer. Psychol. Bull. 112, 155--159.

[21]

C. Conati and H. Maclaren. 2009. Empirically building and evaluating a probabilistic model of user affect. User Model. User-Adapt. Interact. 19, 267--303.

Digital Library

[22]

C. Conati, S. Marsella, and A. Paiva. 2005. Affective interactions: The computer in the affective loop. In Proceedings of the 10th International Conference on Intelligent User Interfaces, J. Riedl and A. Jameson (Eds.). ACM, New York, NY, 7.

Digital Library

[23]

R. Cowie, E. Douglas-Cowie, and C. Cox. 2005. Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neur. Netw. 18, 371--388.

Digital Library

[24]

R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Sig. Process. Mag. 18, 32--80.

[25]

D. Cueva, R. Gonçalves, F. Cozman, and M. Pereira-Barretto. 2011. Crawling to improve multimodal emotion detection. In Proceedings of the 10th Mexican International Conference on Artificial Intelligence (MICAI'11). Springer-Verlag, Puebla, Mexico, 343--350.

Digital Library

[26]

S. D'Mello. 2013. A selective meta-analysis on the relative incidence of discrete affective states during learning with technology. J. Educ. Psychology Psychol. 105, 1082--1099.

[27]

S. D'Mello and A. Graesser. 2007. Mind and body: Dialogue and posture for affect detection in learning environments. In Proceedings of the 13th International Conference on Artificial Intelligence in Education, R. Lukin et al. (Eds.). IOS Press, Amsterdam, 161--168.

Digital Library

[28]

S. D'Mello and A. Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adap. Interact. 20, 147--187.

Digital Library

[29]

S. D'Mello and A. Graesser. 2012. AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Trans. Interact. Intell. Syst. 2, 23:22--23:39.

Digital Library

[30]

S. D'Mello and J. Kory. 2012. Consistent but modest: Comparing multimodal and unimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, L.-P. Morency, D. Bohus, H. Aghajan, A. Nijholt, J. Cassell and J. Epps (Eds.). ACM New York, NY, 31--38.

Digital Library

[31]

S. K. D'Mello and A. C. Graesser. 2014. Confusion. In International Handbook of Emotions in Education, R. Pekrun and L. Linnenbrink-Garcia (Eds.). Routledge, New York, NY, 289--310.

[32]

S. D'Mello and A. Graesser. 2011. The half-life of cognitive-affective states during complex learning. Cognition Emotion 25, 1299--1308.

[33]

D. Datcu and L. Rothkrantz. 2011. Emotion recognition using bimodal data fusion. In Proceedings of the 12th International Conference on Computer Systems and Technologies. ACM, New York, NY, 122--128.

Digital Library

[34]

S. Dobrišek, R. Gajšek, F. Mihelič, N. Pavešić, and V. Štruc. 2013. Towards efficient multi-modal emotion recognition. Int. J. Adv. Robotic Syst. 10, 1--10.

[35]

E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. Mcrorie, J. C. Martin, L. Devillers, S. Abrilian, and A. Batliner. 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, 488--500.

Digital Library

[36]

S. Duval and R. Tweedie. 2000. Trim and fill: A simple funnel-plot--based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455--463.

[37]

M. Dy, I. Espinosa, P. Go, C. Mendez, and J. Cu. 2010. Multimodal emotion recognition using a spontaneous Filipino emotion database. In Proceedings of the 3rd International Conference on Human-Centric Computing. IEEE, Washington, DC, 1--5.

[38]

P. Ekman. 1992. An argument for basic emotions. Cognition Emotion 6, 169--200.

[39]

H. Elfenbein and N. Ambady. 2002. On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychol. Bull. 128, 203--235.

[40]

S. Emerich, E. Lupu, and A. Apatean. 2009. Emotions recognition by speech and facial expressions analysis. In Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009). Glasgow, Scotland.

[41]

F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. 2010. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Int. 3, 7--19.

[42]

F. Eyben, M. Wollmer, M. F. Valstar, H. Gunes, B. Schuller, and M. Pantic. 2011. String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011). IEEE, Santa Barbara, CA, 322--329.

[43]

J. Fontaine, K. Scherer, E. Roesch, and P. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychol. Sci. 18, 12 (Dec. 2007) 1050--1057.

[44]

K. Forbes-Riley and D. Litman. 2004. Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings of the 4th Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 201--208.

[45]

K. Forbes-Riley and D. J. Litman. 2011. Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Commun. 53, 1115--1136.

Digital Library

[46]

R. Gajsek, V. Struc, and F. Mihelic. 2010. Multi-modal emotion recognition using canonical correlations and acoustic features. In Proceedings of the 20th International Conference on Pattern Recognition. IEEE, Washington, DC, 4133--4136.

Digital Library

[47]

M. Glodek, S. Reuter, M. Schels, K. Dietmayer, and F. Schwenker. 2013. Kalman filter based classifier fusion for affective state recognition. In Proceedings of the 11th International Workshop on Multiple Classifier Systems, Z.-H. Zhou, F. Roli, and J. Kittler (Eds.). Springer, Berlin, 85--94.

[48]

M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, and G. Palm. 2011. Multiple classifier systems for the classification of audio-visual emotional states. In 4th International Conference on Affective Computing and Intelligent Interaction (ACII'11), S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer, Memphis, TN, 359--368.

Digital Library

[49]

S. Gong, C. Shan, and T. Xiang. 2007. Visual inference of human emotion and behaviour. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 22--29.

Digital Library

[50]

A. Graesser, B. McDaniel, P. Chipman, A. Witherspoon, S. D'Mello, and B. Gholson. 2006. Detection of emotions during learning with AutoTutor. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, R. Sun and N. Miyake (Eds.). Cognitive Science Society, Austin, TX, 285--290.

[51]

H. Gunes and M. Piccardi. 2005. Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction (ACII'05), J. Tao and R. Picard (Eds.). Springer-Verlag, 102--111.

Digital Library

[52]

H. Gunes and M. Piccardi. 2009. Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 39, 64--84.

Digital Library

[53]

M. Han, J. Hsu, K.-T. Song, and F.-Y. Chang. 2007. A new information fusion method for SVM-based robotic audio-visual emotion recognition. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE, Washington, DC, 2656--2661.

[54]

S. Haq and P. Jackson. 2009. Speaker-dependent audio-visual emotion recognition. In Proceedings of International Conference on Auditory-Visual Speech Processing, 53--58.

[55]

S. Haq, P. Jackson, and J. Edge. 2008. Audio-visual feature selection and reduction for emotion classification. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 185--190.

[56]

S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll. 2005. Bimodal fusion of emotional data in an automotive environment. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1085--1088.

[57]

S. Hommel, A. Rabie, and U. Handmann. 2013. Attention and emotion based adaption of dialog systems. In Intelligent Systems: Models and Applications, E. Pap (Ed.). Springer-Verlag, Berlin, 215--235.

[58]

M. Hoque and R. W. Picard. 2011. Acted vs. natural frustration and delight: Many people smile in natural frustration. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG'11). IEEE, Washington, DC, 354--359.

[59]

M. Hussain, H. Monkaresi, and R. Calvo. 2012. Combining classifiers in multimodal affect detection. In Proceedings of the Australasian Data Mining Conference.

Digital Library

[60]

C. Izard. 2010. The many meanings/aspects of emotion: Definitions, functions, activation, and regulation. Emotion Rev. 2, 363--370.

[61]

C. E. Izard. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect. Psychol. Sci. 2, 260--280.

[62]

A. Jaimes and N. Sebe. 2007. Multimodal human-computer interaction: A survey. Comput. Vision Image Understanding 108, 116--134.

Digital Library

[63]

L. Jeni, J. Cohn, and F. De La Torre. 2013. Facing imbalanced data—Recommendations for the use of performance metrics. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII'13), A. Nijholt, S. K. D'Mello, and M. Pantic (Eds.). IEEE, Washington, DC, 245--251.

Digital Library

[64]

D. Jiang, Y. Cui, X. Zhang, P. Fan, I. Ganzalez, and H. Sahli. 2011. Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction, S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer-Verlag, 609--618.

Digital Library

[65]

J.-T. Joo, S.-W. Seo, K.-E. Ko, and K.-B. Sim. 2007. Emotion recognition method based on multimodal sensor fusion algorithm. In Proceedings of the 8th Symposium on Advanced Intelligent Systems. 200--204.

[66]

C. Kaernbach. 2011. On dimensions in emotion psychology. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops. IEEE, Washington, DC, 792--796.

[67]

I. Kanluan, M. Grimm, and K. Kroschel. 2008. Audio-visual emotion recognition using an emotion space concept. In Proceedings of the 16th European Signal Processing Conference.

[68]

A. Kapoor, B. Burleson, and R. Picard. 2007. Automatic prediction of frustration. Int. J. Hum.Comput. Stud. 65, 724--736.

Digital Library

[69]

A. Kapoor and R. Picard. 2005. Multimodal affect recognition in learning environments. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, New York, NY, 677--682.

Digital Library

[70]

K. Karpouzis, G. Caridakis, L. Kessous, N. Amir, A. Raouzaiou, L. Malatesta, and S. Kollias. 2007. Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In Artifical Intelligence for Human Computing, T. Huang (Ed.). Springer-Verlag, Berlin, 91--112.

Digital Library

[71]

L. Kessous, G. Castellano, and G. Caridakis. 2010. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Int. 3, 33--48.

[72]

Z. Khalali and M. Moradi. 2009. Emotion recognition system using brain and peripheral signals: Using correlation dimension to improve the results of EEG. In Proceedings of International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 1571--1575.

Digital Library

[73]

J. Kim. 2007. Bimodal emotion recognition using speech and physiological changes. In Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel (Eds.). I-Tech, 265--280.

[74]

J. Kim, E. André, M. Rehm, T. Vogt, and J. Wagner. 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In Proceedings of 9th European Conference on Speech Communication and Technology. 809--812.

[75]

J. Kim and F. Lingenfelser. 2010. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing. BIOSTEC, 460--463.

[76]

S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. Deap: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3, 18--31.

Digital Library

[77]

J. Kory and S. K. D'Mello. 2014. Affect elicitation for affective computing. In The Oxford Handbook of Affective Computing, R. Calvo, S. D'Mello, J. Gratch, and A. Kappas (Eds.). Oxford University Press, New York, NY.

[78]

G. Krell, M. Glodek, A. Panning, I. Siegert, B. Michaelis, A. Wendemuth, and F. Schwenker. 2013. Fusion of fragmentary classifier decisions for affective state recognition. In Proceedings of the 1st International Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, F. Schwenker, S. Scherer, and L.-P. Morency (Eds.). Springer-Verlag, Berlin, 116--130.

Digital Library

[79]

M. D. Lewis. 2005. Bridging emotion theory and neurobiology through dynamic systems modeling. Behav. Brain Sci. 28, 169--245.

[80]

J. Lin, C. Wu, and W. Wei. 2012. Error weighted semi-coupled hidden markov model for audio-visual emotion recognition. IEEE Trans. Multimedia 14, 142--156.

Digital Library

[81]

F. Lingenfelser, J. Wagner, and E. André. 2011. A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 19--26.

Digital Library

[82]

M. W. Lipsey and D. B. Wilson. 2001. Practical meta-analysis. Sage Publications, Inc, Thousand Oaks, CA.

[83]

D. Litman and K. Forbes-Riley. 2004. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Barcelona, Spain, 352--359.

Digital Library

[84]

D. Litman and K. Forbes-Riley. 2006a. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.

[85]

D. J. Litman and K. Forbes-Riley. 2006b. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.

[86]

K. Lu and Y. Jia. 2012. Audio-visual emotion recognition with boosted coupled HMM. In Proceedings of the 21st International Conference on Pattern Recognition. IEEE, Washington, DC, 1148--1151.

[87]

M. Mansoorizadeh and N. Charkari. 2010. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools Appl. 49, 277--297.

Digital Library

[88]

D. McDuff, R. Kaliouby, and R. W. Picard. 2012. Crowdsourcing facial responses to online videos. IEEE Trans. Affective Comput. 3, 456--468.

Digital Library

[89]

G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3, 5--17.

Digital Library

[90]

A. Metallinou, S. Lee, and S. Narayanan. 2008. Audio-visual emotion recognition using Gaussian mixture models for face and voice. In Proceedings of the 10th IEEE International Symposium on Multimedia. IEEE, Washington, DC, 250--257.

Digital Library

[91]

A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Comput. 3, 184--198.

Digital Library

[92]

H. Monkaresi, M. S. Hussain, and R. Calvo. 2012. Classification of affects using head movement, skin color features and physiological signals. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Washington, DC, 2664--2669.

[93]

M. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence and arousal space. IEEE Trans. Affective Comput. 2, 92--105.

Digital Library

[94]

J. Ocumpaugh, R. Baker, S. Gowda, N. Heffernan, and C. Heffernan. 2014. Population validity for educational data mining: A case study in affect detection. Brit. J. Educ. Psychol. 45, 487--501.

[95]

A. Ortony, G. Clore, and A. Collins. 1988. The Cognitive Structure of Emotions. Cambridge University Press, New York.

[96]

P. Pal, A. Iyer, and R. Yantorno. 2006. Emotion detection from infant facial expressions and cries. In Proceedings. of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Washington, DC, 721--724.

[97]

M. Paleari, R. Benmokhtar, and B. Huet. 2009. Evidence theory-based multimodal emotion recognition. In Proceedings of the 15th International Multimedia Modeling Conference (MMM'09). Springer-Verlag, 435--446.

Digital Library

[98]

B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135.

Digital Library

[99]

M. Pantic and L. Rothkrantz. 2003. Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91, 1370--1390.

[100]

J. Park, G. Jang, and Y. Seo. 2012. Music-aided affective interaction between human and service robot. EURASIP J. Audio, Speech, Music Process. 2012, 1, 1--13.

[101]

R. Picard. 1997. Affective Computing. MIT Press, Cambridge, Mass.

Digital Library

[102]

R. Picard. 2010. Affective Computing: From Laughter to IEEE. IEEE Trans. Affective Comput. 1, 11--17.

Digital Library

[103]

R. Plutchik. 2001. The nature of emotions. American Scientist 89, 344--350.

[104]

A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. 2009. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the Second International Conference on Computer and Electrical Engineering (ICCEE'09). IEEE Computer Society, 598--602.

Digital Library

[105]

M. Rashid, S. Abu-Bakar, and M. Mokji. 2012. Human emotion recognition from videos using spatio-temporal and audio features. Visual Comput. 29, 12, 1269--1275.

Digital Library

[106]

G. Rigoll, R. Muller, and B. Schuller. 2005. Speech emotion recognition exploiting acoustic and linguistic information sources. In Proceedings of the 10th International Conference Speech and Computer. 61--67.

[107]

V. Rosas, R. Mihalcea, and L. Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28, 38--45.

Digital Library

[108]

E. Rosenberg. 1998. Levels of analysis and the organization of affect. Rev. Gen. Psychol. 2, 247--270.

[109]

E. Rosenberg and P. Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition Emotion 8, 201--229.

[110]

V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad. 2012. Ensemble of SVM trees for multimodal emotion recognition. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference. IEEE, Washington, DC, 1--4.

[111]

J. Russell. 1994. Is there universal recognition of emotion from facial expression&quest; A review of the cross-cultural studies. Psychol. Bull. 115, 102--141.

[112]

J. Russell. 2003. Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145--172.

[113]

J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and vocal expressions of emotion. Ann. Rev. Psychol. 54, 329--349.

[114]

A. Savran, H. Cao, M. Shah, A. Nenkova, and R. Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM Press, New York, NY, 485--492.

Digital Library

[115]

B. Schuller. 2011. Recognizing affect from linguistic information in 3D continuous space. IEEE Trans. Affective Comput. 2, 192--205.

Digital Library

[116]

B. Schuller, R. Müeller, B. Höernler, A. Höethker, H. Konosu, and G. Rigoll. 2007. Audiovisual recognition of spontaneous interest within conversations. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 30--37.

Digital Library

[117]

N. Sebe, I. Cohen, T. Gevers, and T. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of the 18th International Conference on Pattern Recognition. IEEE, Washington, DC, 1136--1139.

Digital Library

[118]

D. Seppi, A. Batliner, B. Schuller, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, and V. Aharonson. 2008. Patterns, prototypes, performance: Classifying emotional user states. In Proceedings of the 9th Annual Conference of the International Speech Communication Association, 601--604.

[119]

C. Shan, S. Gong, and P. McOwan. 2007. Beyond facial expressions: Learning human emotion from body gestures. In Proceedings of the British Machine Vision Conference, 1--10.

[120]

M. Soleymani, M. Pantic, and T. Pun. 2012. Multi-modal emotion recognition in response to videos. IEEE Trans. Affective Comput. 3, 211--223.

Digital Library

[121]

A. Sutdiffe. 2008. Multimedia user interface design. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, A. Sears and J. Jacko (Eds.). Taylor & Francis, New York, NY, 245--261.

Digital Library

[122]

S. S. Tomkins. 1962. Affect Imagery Consciousness: Volume I, The Positive Affects. Tavistock, London.

[123]

B. Tu and F. Yu. 2012. Bimodal emotion recognition based on speech signals and facial expression. In Proceedings of the 6th International Conference on Intelligent Systems and Knowledge. Springer, Berlin, 691--696.

[124]

J. Tukey and D. McLaughlin. 1963. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization 1. Sankhyā: The Indian Journal of Statistics 25, 331--352.

[125]

M. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer. 2012. Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 42, 966--979.

Digital Library

[126]

M. van der Zwaag, J. Janssen, and J. Westerink. 2013. Directing physiology and mood through music: Validation of an affective music player. IEEE Trans. Affective Comput. 4, 57--68.

Digital Library

[127]

H. Vu, Y. Yamazaki, F. Dong, and K. Hirota. 2011. Emotion recognition based on human gesture and speech information using RT middleware. In IEEE International Conference on Fuzzy Systems. IEEE, Washington, DC, 787--791.

[128]

J. Wagner, E. Andre, F. Lingenfelser, J. Kim, and T. Vogt. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affective Comput. 2, 206--218.

Digital Library

[129]

S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. Traue, and F. Schwenker. 2011. Multimodal emotion classification in naturalistic user behavior. In Proceedings of the International Conference on Human-Computer Interaction, J. Jacko (Ed.). Springer, Berlin, 603--611.

Digital Library

[130]

S. Wang, Y. Zhu, G. Wu, and Q. Ji. 2013. Hybrid video emotional tagging using users' EEG and video content. Multimed. Tools Appl. 1--27.

Digital Library

[131]

Y. Wang and L. Guan. 2005. Recognizing human emotion from audiovisual information. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1125--1128.

[132]

Y. Wang and L. Guan. 2008. Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10, 936--946.

Digital Library

[133]

M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. 2008. Low-level fusion of audio and video feature for multi-modal emotion recognition. In Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, 145--151.

[134]

M. Wöllmer, M. Kaiser, F. Eyben, and B. Schuller. 2013a. LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image Vision Comput. 31, 2 (Feb. 2013), 153--163.

Digital Library

[135]

M. Wöllmer, A. Metallinou, F. Eyben, B. Schuller, and S. S. Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH'10). 2362--2365.

[136]

M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. Morency. 2013b. YouTube movie reviews: Sentiment analysis in an audiovisual context. IEEE Intell. Syst. 28, 46--53.

Digital Library

[137]

C. Wu and W. Liang. 2011. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Comput. 2, 10--21.

Digital Library

[138]

Z. Zeng, Y. Hu, Y. Fu, T. Huang, G. Roisman, and Z. Wen. 2006. Audio-visual emotion recognition in adult attachment interview. In Proceedings of the 8th International Conference on Multimodal Interfaces. ACM, Washington, DC, 139--145.

Digital Library

[139]

Z. Zeng, M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39--58.

Digital Library

[140]

Z. Zeng, J. Tu, M. Liu, and T. Huang. 2005. Multi-stream confidence analysis for audio-visual affect recognition. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, J. Tao., T. Tan. and R. Picard. (Eds.). Springer, Berlin, 964--971.

Digital Library

[141]

Z. Zeng, J. Tu, M. Liu, T. Huang, B. Pianfetti, D. Roth, and S. Levinson. 2007. Audio-visual affect recognition. IEEE Trans. Multimedia 9, 424--428.

Digital Library

Cited By

Li QJin DHuang JZhong QXu LLin JJiang D(2025)DEMA: Deep EEG-first multi-physiological affect model for emotion recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2024.10681299(106812)Online publication date: Jan-2025
https://doi.org/10.1016/j.bspc.2024.106812
Pei GLi HLu YWang YHua SLi T(2024)Affective Computing: Recent Advances, Challenges, and Future TrendsIntelligent Computing10.34133/icomputing.00763Online publication date: 5-Jan-2024
https://doi.org/10.34133/icomputing.0076
Ren FZhou YDeng JMatsumoto KFeng DShe TJiao ZLiu ZLi TNakagawa SKang X(2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0075
Show More Cited By

Index Terms

A Review and Meta-Analysis of Multimodal Affect Detection Systems
1. Computing methodologies
  1. Machine learning

Recommendations

Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features

We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant ...
Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications

This survey describes recent progress in the field of Affective Computing (AC), with a focus on affect detection. Although many AC researchers have traditionally attempted to remain agnostic to the different emotion theories proposed by psychologists, ...
Communicative Signals and Social Contextual Factors in Multimodal Affect Recognition
ICMI '19: 2019 International Conference on Multimodal Interaction

One research branch in Affective Computing focuses on using multimodal ‘emotional’ expressions (e.g. facial expressions or non-verbal vocalisations) to automatically detect emotions and affect experienced by persons. The field is increasingly interested ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 47, Issue 3

April 2015

602 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2737799

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2015

Accepted: 01 September 2014

Revised: 01 April 2014

Received: 01 June 2013

Published in CSUR Volume 47, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

Bill & Melinda Gates Foundation
NSF Graduate Research Fellowship under 1122374
National Science Foundation (NSF)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

323
Total Citations
View Citations
3,298
Total Downloads

Downloads (Last 12 months)349
Downloads (Last 6 weeks)35

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li QJin DHuang JZhong QXu LLin JJiang D(2025)DEMA: Deep EEG-first multi-physiological affect model for emotion recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2024.10681299(106812)Online publication date: Jan-2025
https://doi.org/10.1016/j.bspc.2024.106812
Pei GLi HLu YWang YHua SLi T(2024)Affective Computing: Recent Advances, Challenges, and Future TrendsIntelligent Computing10.34133/icomputing.00763Online publication date: 5-Jan-2024
https://doi.org/10.34133/icomputing.0076
Ren FZhou YDeng JMatsumoto KFeng DShe TJiao ZLiu ZLi TNakagawa SKang X(2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0075
Du YLi PCheng LZhang XLi MLi F(2024)Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognitionFrontiers in Neuroscience10.3389/fnins.2023.133007717Online publication date: 10-Jan-2024
https://doi.org/10.3389/fnins.2023.1330077
Bilucaglia MZito MFici ACasiraghi CRivetti FBellati MRusso V(2024)I DARE: IULM Dataset of Affective ResponsesFrontiers in Human Neuroscience10.3389/fnhum.2024.134732718Online publication date: 20-Mar-2024
https://doi.org/10.3389/fnhum.2024.1347327
Amsaprabhaa M(2024)Deep multimodal spatio-temporal Harris Hawk Optimized Pose Recognition framework for self-learning fitness exercisesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23328646:4(9783-9805)Online publication date: 18-Apr-2024
https://doi.org/10.3233/JIFS-233286
熊瑛(2024)A Review of the Research Status of Dimensional Emotion ModelAdvances in Psychology10.12677/ap.2024.14315814:03(270-278)Online publication date: 2024
https://doi.org/10.12677/ap.2024.143158
Banik SSen SSaha SGhosh S(2024)Towards Reducing Continuous Emotion Annotation Effort During Video Consumption: A Physiological Response Profiling ApproachProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785698:3(1-32)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678569
Singh UAbhishek KAzad H(2024)A Survey of Cutting-edge Multimodal Sentiment AnalysisACM Computing Surveys10.1145/365214956:9(1-38)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3652149
Narayanswamy GLiu YYang YMa CLiu XMcDuff DPatel S(2024)BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00773(7899-7909)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00773
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents