Nothing Special   »   [go: up one dir, main page]

skip to main content
survey

A Review and Meta-Analysis of Multimodal Affect Detection Systems

Published: 17 February 2015 Publication History

Abstract

Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature- (38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.

References

[1]
S. Afzal and P. Robinson. 2011. Natural affect data: Collection and annotation. In New Perspectives on Affect and Learning Technologies, R. Calvo and S. D'Mello (Eds.) Springer, New York, NY, 44--70.
[2]
J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and O. John. 2008. Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int. J. Hum. Comput. Stud. 66, 303--317.
[3]
T. Baltrušaitis, N. Banda, and P. Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of the International Conference on Multimedia and Expo (Workshop on Affective Analysis in Multimedia).
[4]
N. Banda and P. Robinson. 2011. Noise analysis in audio-visual emotion recognition. In Proceedings of the 11th International Conference on Multimodal Interaction (ICMI).
[5]
L. Barrett. 2006. Are emotions natural kinds? Perspect. Psychol. Sci. 1, 28--58.
[6]
L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. 2007. The experience of emotion. Ann. Rev. Psychol. 58, 373--403.
[7]
M. Borenstein, L. V. Hedges, J. P. T. Higgins, and H. R. Rothstein. 2009. Introduction to Meta-Analysis. John Wiley & Sons, Inc., Hoboken, NJ.
[8]
S. Brave and C. Nass. 2002. Emotion in human-computer interaction. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears (Eds.). Erlbaum Associates, Inc., Hillsdale, NJ, 81--96.
[9]
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Mutlmodal Interfaces (ICMI'04), R. Sharma, T. M. P. Darrell Harper, G. Lazzari and M. Turk (Eds.). ACM, State College, PA, 205--211.
[10]
R. Calvo, S. K. D'Mello, J. Gratch, and A. Kappas. 2014. The Oxford Handbook of Affective Computing. Oxford University Press, New York, NY.
[11]
R. A. Calvo and S. K. D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1 (2007), 18--37.
[12]
G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Paouzaiou, and K. Karpouzis. 2006. Modeling naturalistic affective states via facial and vocal expression recognition. In International Conference on Multimidal Interfaces. ACM, New York, NY, 146--154.
[13]
G. Castellano, L. Kessous, and G. Caridakis. 2008. Emotion recognition through multiple modalities: Face, body gesture, speech. In Affect and Emotion in Human-Computer Interaction, C. Peter and R. Beale (Eds.). Lecture Notes in Computer Science, Vol. 4868. Springer, Berlin, 92--103.
[14]
G. Castellano, A. Pereira, I. Leite, A. Paiva, and P. McOwan. 2009. Detecting user engagement with a robot companion using task and social interaction-based features. In Proceedings of the 2009 International Conference on Multimodal interfaces. ACM, New York, NY, 119--126.
[15]
G. Chanel, C. Rebetez, M. Bétrancourt, and T. Pun. 2011. Emotion assessment from physiological signals for adaptation of game difficulty. IEEE Trans. Syst., Man Cybern. Part A Syst. Humans 41, 1052--1063.
[16]
C.-Y. Chen, Y.-K. Huang, and P. Cook. 2005. Visual/Acoustic emotion recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, Washington, DC, 1468--1471.
[17]
G. Chetty and M. Wagner. 2008. A multilevel fusion approach for audiovisual emotion recognition. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 115--120.
[18]
Z.-J. Chuang and C.-H. Wu. 2004. Multi-modal emotion recognition from speech and text. Int. J. Comput. Ling. Chin. Lang. Process. 9, 1--18.
[19]
J. A. Coan. 2010. Emergent ghosts of the emotion machine. Emotion Rev. 2, 274--285.
[20]
J. Cohen. 1992. A power primer. Psychol. Bull. 112, 155--159.
[21]
C. Conati and H. Maclaren. 2009. Empirically building and evaluating a probabilistic model of user affect. User Model. User-Adapt. Interact. 19, 267--303.
[22]
C. Conati, S. Marsella, and A. Paiva. 2005. Affective interactions: The computer in the affective loop. In Proceedings of the 10th International Conference on Intelligent User Interfaces, J. Riedl and A. Jameson (Eds.). ACM, New York, NY, 7.
[23]
R. Cowie, E. Douglas-Cowie, and C. Cox. 2005. Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neur. Netw. 18, 371--388.
[24]
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Sig. Process. Mag. 18, 32--80.
[25]
D. Cueva, R. Gonçalves, F. Cozman, and M. Pereira-Barretto. 2011. Crawling to improve multimodal emotion detection. In Proceedings of the 10th Mexican International Conference on Artificial Intelligence (MICAI'11). Springer-Verlag, Puebla, Mexico, 343--350.
[26]
S. D'Mello. 2013. A selective meta-analysis on the relative incidence of discrete affective states during learning with technology. J. Educ. Psychology Psychol. 105, 1082--1099.
[27]
S. D'Mello and A. Graesser. 2007. Mind and body: Dialogue and posture for affect detection in learning environments. In Proceedings of the 13th International Conference on Artificial Intelligence in Education, R. Lukin et al. (Eds.). IOS Press, Amsterdam, 161--168.
[28]
S. D'Mello and A. Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adap. Interact. 20, 147--187.
[29]
S. D'Mello and A. Graesser. 2012. AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Trans. Interact. Intell. Syst. 2, 23:22--23:39.
[30]
S. D'Mello and J. Kory. 2012. Consistent but modest: Comparing multimodal and unimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, L.-P. Morency, D. Bohus, H. Aghajan, A. Nijholt, J. Cassell and J. Epps (Eds.). ACM New York, NY, 31--38.
[31]
S. K. D'Mello and A. C. Graesser. 2014. Confusion. In International Handbook of Emotions in Education, R. Pekrun and L. Linnenbrink-Garcia (Eds.). Routledge, New York, NY, 289--310.
[32]
S. D'Mello and A. Graesser. 2011. The half-life of cognitive-affective states during complex learning. Cognition Emotion 25, 1299--1308.
[33]
D. Datcu and L. Rothkrantz. 2011. Emotion recognition using bimodal data fusion. In Proceedings of the 12th International Conference on Computer Systems and Technologies. ACM, New York, NY, 122--128.
[34]
S. Dobrišek, R. Gajšek, F. Mihelič, N. Pavešić, and V. Štruc. 2013. Towards efficient multi-modal emotion recognition. Int. J. Adv. Robotic Syst. 10, 1--10.
[35]
E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. Mcrorie, J. C. Martin, L. Devillers, S. Abrilian, and A. Batliner. 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, 488--500.
[36]
S. Duval and R. Tweedie. 2000. Trim and fill: A simple funnel-plot--based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455--463.
[37]
M. Dy, I. Espinosa, P. Go, C. Mendez, and J. Cu. 2010. Multimodal emotion recognition using a spontaneous Filipino emotion database. In Proceedings of the 3rd International Conference on Human-Centric Computing. IEEE, Washington, DC, 1--5.
[38]
P. Ekman. 1992. An argument for basic emotions. Cognition Emotion 6, 169--200.
[39]
H. Elfenbein and N. Ambady. 2002. On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychol. Bull. 128, 203--235.
[40]
S. Emerich, E. Lupu, and A. Apatean. 2009. Emotions recognition by speech and facial expressions analysis. In Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009). Glasgow, Scotland.
[41]
F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. 2010. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Int. 3, 7--19.
[42]
F. Eyben, M. Wollmer, M. F. Valstar, H. Gunes, B. Schuller, and M. Pantic. 2011. String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011). IEEE, Santa Barbara, CA, 322--329.
[43]
J. Fontaine, K. Scherer, E. Roesch, and P. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychol. Sci. 18, 12 (Dec. 2007) 1050--1057.
[44]
K. Forbes-Riley and D. Litman. 2004. Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings of the 4th Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 201--208.
[45]
K. Forbes-Riley and D. J. Litman. 2011. Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Commun. 53, 1115--1136.
[46]
R. Gajsek, V. Struc, and F. Mihelic. 2010. Multi-modal emotion recognition using canonical correlations and acoustic features. In Proceedings of the 20th International Conference on Pattern Recognition. IEEE, Washington, DC, 4133--4136.
[47]
M. Glodek, S. Reuter, M. Schels, K. Dietmayer, and F. Schwenker. 2013. Kalman filter based classifier fusion for affective state recognition. In Proceedings of the 11th International Workshop on Multiple Classifier Systems, Z.-H. Zhou, F. Roli, and J. Kittler (Eds.). Springer, Berlin, 85--94.
[48]
M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, and G. Palm. 2011. Multiple classifier systems for the classification of audio-visual emotional states. In 4th International Conference on Affective Computing and Intelligent Interaction (ACII'11), S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer, Memphis, TN, 359--368.
[49]
S. Gong, C. Shan, and T. Xiang. 2007. Visual inference of human emotion and behaviour. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 22--29.
[50]
A. Graesser, B. McDaniel, P. Chipman, A. Witherspoon, S. D'Mello, and B. Gholson. 2006. Detection of emotions during learning with AutoTutor. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, R. Sun and N. Miyake (Eds.). Cognitive Science Society, Austin, TX, 285--290.
[51]
H. Gunes and M. Piccardi. 2005. Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction (ACII'05), J. Tao and R. Picard (Eds.). Springer-Verlag, 102--111.
[52]
H. Gunes and M. Piccardi. 2009. Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 39, 64--84.
[53]
M. Han, J. Hsu, K.-T. Song, and F.-Y. Chang. 2007. A new information fusion method for SVM-based robotic audio-visual emotion recognition. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE, Washington, DC, 2656--2661.
[54]
S. Haq and P. Jackson. 2009. Speaker-dependent audio-visual emotion recognition. In Proceedings of International Conference on Auditory-Visual Speech Processing, 53--58.
[55]
S. Haq, P. Jackson, and J. Edge. 2008. Audio-visual feature selection and reduction for emotion classification. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 185--190.
[56]
S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll. 2005. Bimodal fusion of emotional data in an automotive environment. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1085--1088.
[57]
S. Hommel, A. Rabie, and U. Handmann. 2013. Attention and emotion based adaption of dialog systems. In Intelligent Systems: Models and Applications, E. Pap (Ed.). Springer-Verlag, Berlin, 215--235.
[58]
M. Hoque and R. W. Picard. 2011. Acted vs. natural frustration and delight: Many people smile in natural frustration. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG'11). IEEE, Washington, DC, 354--359.
[59]
M. Hussain, H. Monkaresi, and R. Calvo. 2012. Combining classifiers in multimodal affect detection. In Proceedings of the Australasian Data Mining Conference.
[60]
C. Izard. 2010. The many meanings/aspects of emotion: Definitions, functions, activation, and regulation. Emotion Rev. 2, 363--370.
[61]
C. E. Izard. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect. Psychol. Sci. 2, 260--280.
[62]
A. Jaimes and N. Sebe. 2007. Multimodal human-computer interaction: A survey. Comput. Vision Image Understanding 108, 116--134.
[63]
L. Jeni, J. Cohn, and F. De La Torre. 2013. Facing imbalanced data—Recommendations for the use of performance metrics. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII'13), A. Nijholt, S. K. D'Mello, and M. Pantic (Eds.). IEEE, Washington, DC, 245--251.
[64]
D. Jiang, Y. Cui, X. Zhang, P. Fan, I. Ganzalez, and H. Sahli. 2011. Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction, S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer-Verlag, 609--618.
[65]
J.-T. Joo, S.-W. Seo, K.-E. Ko, and K.-B. Sim. 2007. Emotion recognition method based on multimodal sensor fusion algorithm. In Proceedings of the 8th Symposium on Advanced Intelligent Systems. 200--204.
[66]
C. Kaernbach. 2011. On dimensions in emotion psychology. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops. IEEE, Washington, DC, 792--796.
[67]
I. Kanluan, M. Grimm, and K. Kroschel. 2008. Audio-visual emotion recognition using an emotion space concept. In Proceedings of the 16th European Signal Processing Conference.
[68]
A. Kapoor, B. Burleson, and R. Picard. 2007. Automatic prediction of frustration. Int. J. Hum.Comput. Stud. 65, 724--736.
[69]
A. Kapoor and R. Picard. 2005. Multimodal affect recognition in learning environments. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, New York, NY, 677--682.
[70]
K. Karpouzis, G. Caridakis, L. Kessous, N. Amir, A. Raouzaiou, L. Malatesta, and S. Kollias. 2007. Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In Artifical Intelligence for Human Computing, T. Huang (Ed.). Springer-Verlag, Berlin, 91--112.
[71]
L. Kessous, G. Castellano, and G. Caridakis. 2010. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Int. 3, 33--48.
[72]
Z. Khalali and M. Moradi. 2009. Emotion recognition system using brain and peripheral signals: Using correlation dimension to improve the results of EEG. In Proceedings of International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 1571--1575.
[73]
J. Kim. 2007. Bimodal emotion recognition using speech and physiological changes. In Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel (Eds.). I-Tech, 265--280.
[74]
J. Kim, E. André, M. Rehm, T. Vogt, and J. Wagner. 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In Proceedings of 9th European Conference on Speech Communication and Technology. 809--812.
[75]
J. Kim and F. Lingenfelser. 2010. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing. BIOSTEC, 460--463.
[76]
S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. Deap: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3, 18--31.
[77]
J. Kory and S. K. D'Mello. 2014. Affect elicitation for affective computing. In The Oxford Handbook of Affective Computing, R. Calvo, S. D'Mello, J. Gratch, and A. Kappas (Eds.). Oxford University Press, New York, NY.
[78]
G. Krell, M. Glodek, A. Panning, I. Siegert, B. Michaelis, A. Wendemuth, and F. Schwenker. 2013. Fusion of fragmentary classifier decisions for affective state recognition. In Proceedings of the 1st International Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, F. Schwenker, S. Scherer, and L.-P. Morency (Eds.). Springer-Verlag, Berlin, 116--130.
[79]
M. D. Lewis. 2005. Bridging emotion theory and neurobiology through dynamic systems modeling. Behav. Brain Sci. 28, 169--245.
[80]
J. Lin, C. Wu, and W. Wei. 2012. Error weighted semi-coupled hidden markov model for audio-visual emotion recognition. IEEE Trans. Multimedia 14, 142--156.
[81]
F. Lingenfelser, J. Wagner, and E. André. 2011. A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 19--26.
[82]
M. W. Lipsey and D. B. Wilson. 2001. Practical meta-analysis. Sage Publications, Inc, Thousand Oaks, CA.
[83]
D. Litman and K. Forbes-Riley. 2004. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Barcelona, Spain, 352--359.
[84]
D. Litman and K. Forbes-Riley. 2006a. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.
[85]
D. J. Litman and K. Forbes-Riley. 2006b. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.
[86]
K. Lu and Y. Jia. 2012. Audio-visual emotion recognition with boosted coupled HMM. In Proceedings of the 21st International Conference on Pattern Recognition. IEEE, Washington, DC, 1148--1151.
[87]
M. Mansoorizadeh and N. Charkari. 2010. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools Appl. 49, 277--297.
[88]
D. McDuff, R. Kaliouby, and R. W. Picard. 2012. Crowdsourcing facial responses to online videos. IEEE Trans. Affective Comput. 3, 456--468.
[89]
G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3, 5--17.
[90]
A. Metallinou, S. Lee, and S. Narayanan. 2008. Audio-visual emotion recognition using Gaussian mixture models for face and voice. In Proceedings of the 10th IEEE International Symposium on Multimedia. IEEE, Washington, DC, 250--257.
[91]
A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Comput. 3, 184--198.
[92]
H. Monkaresi, M. S. Hussain, and R. Calvo. 2012. Classification of affects using head movement, skin color features and physiological signals. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Washington, DC, 2664--2669.
[93]
M. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence and arousal space. IEEE Trans. Affective Comput. 2, 92--105.
[94]
J. Ocumpaugh, R. Baker, S. Gowda, N. Heffernan, and C. Heffernan. 2014. Population validity for educational data mining: A case study in affect detection. Brit. J. Educ. Psychol. 45, 487--501.
[95]
A. Ortony, G. Clore, and A. Collins. 1988. The Cognitive Structure of Emotions. Cambridge University Press, New York.
[96]
P. Pal, A. Iyer, and R. Yantorno. 2006. Emotion detection from infant facial expressions and cries. In Proceedings. of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Washington, DC, 721--724.
[97]
M. Paleari, R. Benmokhtar, and B. Huet. 2009. Evidence theory-based multimodal emotion recognition. In Proceedings of the 15th International Multimedia Modeling Conference (MMM'09). Springer-Verlag, 435--446.
[98]
B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135.
[99]
M. Pantic and L. Rothkrantz. 2003. Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91, 1370--1390.
[100]
J. Park, G. Jang, and Y. Seo. 2012. Music-aided affective interaction between human and service robot. EURASIP J. Audio, Speech, Music Process. 2012, 1, 1--13.
[101]
R. Picard. 1997. Affective Computing. MIT Press, Cambridge, Mass.
[102]
R. Picard. 2010. Affective Computing: From Laughter to IEEE. IEEE Trans. Affective Comput. 1, 11--17.
[103]
R. Plutchik. 2001. The nature of emotions. American Scientist 89, 344--350.
[104]
A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. 2009. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the Second International Conference on Computer and Electrical Engineering (ICCEE'09). IEEE Computer Society, 598--602.
[105]
M. Rashid, S. Abu-Bakar, and M. Mokji. 2012. Human emotion recognition from videos using spatio-temporal and audio features. Visual Comput. 29, 12, 1269--1275.
[106]
G. Rigoll, R. Muller, and B. Schuller. 2005. Speech emotion recognition exploiting acoustic and linguistic information sources. In Proceedings of the 10th International Conference Speech and Computer. 61--67.
[107]
V. Rosas, R. Mihalcea, and L. Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28, 38--45.
[108]
E. Rosenberg. 1998. Levels of analysis and the organization of affect. Rev. Gen. Psychol. 2, 247--270.
[109]
E. Rosenberg and P. Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition Emotion 8, 201--229.
[110]
V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad. 2012. Ensemble of SVM trees for multimodal emotion recognition. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference. IEEE, Washington, DC, 1--4.
[111]
J. Russell. 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychol. Bull. 115, 102--141.
[112]
J. Russell. 2003. Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145--172.
[113]
J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and vocal expressions of emotion. Ann. Rev. Psychol. 54, 329--349.
[114]
A. Savran, H. Cao, M. Shah, A. Nenkova, and R. Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM Press, New York, NY, 485--492.
[115]
B. Schuller. 2011. Recognizing affect from linguistic information in 3D continuous space. IEEE Trans. Affective Comput. 2, 192--205.
[116]
B. Schuller, R. Müeller, B. Höernler, A. Höethker, H. Konosu, and G. Rigoll. 2007. Audiovisual recognition of spontaneous interest within conversations. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 30--37.
[117]
N. Sebe, I. Cohen, T. Gevers, and T. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of the 18th International Conference on Pattern Recognition. IEEE, Washington, DC, 1136--1139.
[118]
D. Seppi, A. Batliner, B. Schuller, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, and V. Aharonson. 2008. Patterns, prototypes, performance: Classifying emotional user states. In Proceedings of the 9th Annual Conference of the International Speech Communication Association, 601--604.
[119]
C. Shan, S. Gong, and P. McOwan. 2007. Beyond facial expressions: Learning human emotion from body gestures. In Proceedings of the British Machine Vision Conference, 1--10.
[120]
M. Soleymani, M. Pantic, and T. Pun. 2012. Multi-modal emotion recognition in response to videos. IEEE Trans. Affective Comput. 3, 211--223.
[121]
A. Sutdiffe. 2008. Multimedia user interface design. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, A. Sears and J. Jacko (Eds.). Taylor & Francis, New York, NY, 245--261.
[122]
S. S. Tomkins. 1962. Affect Imagery Consciousness: Volume I, The Positive Affects. Tavistock, London.
[123]
B. Tu and F. Yu. 2012. Bimodal emotion recognition based on speech signals and facial expression. In Proceedings of the 6th International Conference on Intelligent Systems and Knowledge. Springer, Berlin, 691--696.
[124]
J. Tukey and D. McLaughlin. 1963. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization 1. Sankhyā: The Indian Journal of Statistics 25, 331--352.
[125]
M. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer. 2012. Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 42, 966--979.
[126]
M. van der Zwaag, J. Janssen, and J. Westerink. 2013. Directing physiology and mood through music: Validation of an affective music player. IEEE Trans. Affective Comput. 4, 57--68.
[127]
H. Vu, Y. Yamazaki, F. Dong, and K. Hirota. 2011. Emotion recognition based on human gesture and speech information using RT middleware. In IEEE International Conference on Fuzzy Systems. IEEE, Washington, DC, 787--791.
[128]
J. Wagner, E. Andre, F. Lingenfelser, J. Kim, and T. Vogt. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affective Comput. 2, 206--218.
[129]
S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. Traue, and F. Schwenker. 2011. Multimodal emotion classification in naturalistic user behavior. In Proceedings of the International Conference on Human-Computer Interaction, J. Jacko (Ed.). Springer, Berlin, 603--611.
[130]
S. Wang, Y. Zhu, G. Wu, and Q. Ji. 2013. Hybrid video emotional tagging using users' EEG and video content. Multimed. Tools Appl. 1--27.
[131]
Y. Wang and L. Guan. 2005. Recognizing human emotion from audiovisual information. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1125--1128.
[132]
Y. Wang and L. Guan. 2008. Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10, 936--946.
[133]
M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. 2008. Low-level fusion of audio and video feature for multi-modal emotion recognition. In Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, 145--151.
[134]
M. Wöllmer, M. Kaiser, F. Eyben, and B. Schuller. 2013a. LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image Vision Comput. 31, 2 (Feb. 2013), 153--163.
[135]
M. Wöllmer, A. Metallinou, F. Eyben, B. Schuller, and S. S. Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH'10). 2362--2365.
[136]
M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. Morency. 2013b. YouTube movie reviews: Sentiment analysis in an audiovisual context. IEEE Intell. Syst. 28, 46--53.
[137]
C. Wu and W. Liang. 2011. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Comput. 2, 10--21.
[138]
Z. Zeng, Y. Hu, Y. Fu, T. Huang, G. Roisman, and Z. Wen. 2006. Audio-visual emotion recognition in adult attachment interview. In Proceedings of the 8th International Conference on Multimodal Interfaces. ACM, Washington, DC, 139--145.
[139]
Z. Zeng, M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39--58.
[140]
Z. Zeng, J. Tu, M. Liu, and T. Huang. 2005. Multi-stream confidence analysis for audio-visual affect recognition. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, J. Tao., T. Tan. and R. Picard. (Eds.). Springer, Berlin, 964--971.
[141]
Z. Zeng, J. Tu, M. Liu, T. Huang, B. Pianfetti, D. Roth, and S. Levinson. 2007. Audio-visual affect recognition. IEEE Trans. Multimedia 9, 424--428.

Cited By

View all
  • (2025)DEMA: Deep EEG-first multi-physiological affect model for emotion recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2024.10681299(106812)Online publication date: Jan-2025
  • (2024)Affective Computing: Recent Advances, Challenges, and Future TrendsIntelligent Computing10.34133/icomputing.00763Online publication date: 5-Jan-2024
  • (2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
  • Show More Cited By

Index Terms

  1. A Review and Meta-Analysis of Multimodal Affect Detection Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 47, Issue 3
    April 2015
    602 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/2737799
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2015
    Accepted: 01 September 2014
    Revised: 01 April 2014
    Received: 01 June 2013
    Published in CSUR Volume 47, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Affective computing
    2. evaluation
    3. human-centered computing
    4. methodology
    5. survey

    Qualifiers

    • Survey
    • Research
    • Refereed

    Funding Sources

    • Bill & Melinda Gates Foundation
    • NSF Graduate Research Fellowship under 1122374
    • National Science Foundation (NSF)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)349
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)DEMA: Deep EEG-first multi-physiological affect model for emotion recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2024.10681299(106812)Online publication date: Jan-2025
    • (2024)Affective Computing: Recent Advances, Challenges, and Future TrendsIntelligent Computing10.34133/icomputing.00763Online publication date: 5-Jan-2024
    • (2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
    • (2024)Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognitionFrontiers in Neuroscience10.3389/fnins.2023.133007717Online publication date: 10-Jan-2024
    • (2024)I DARE: IULM Dataset of Affective ResponsesFrontiers in Human Neuroscience10.3389/fnhum.2024.134732718Online publication date: 20-Mar-2024
    • (2024)Deep multimodal spatio-temporal Harris Hawk Optimized Pose Recognition framework for self-learning fitness exercisesJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23328646:4(9783-9805)Online publication date: 18-Apr-2024
    • (2024)A Review of the Research Status of Dimensional Emotion ModelAdvances in Psychology10.12677/ap.2024.14315814:03(270-278)Online publication date: 2024
    • (2024)Towards Reducing Continuous Emotion Annotation Effort During Video Consumption: A Physiological Response Profiling ApproachProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785698:3(1-32)Online publication date: 9-Sep-2024
    • (2024)A Survey of Cutting-edge Multimodal Sentiment AnalysisACM Computing Surveys10.1145/365214956:9(1-38)Online publication date: 25-Apr-2024
    • (2024)BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological Measurements2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00773(7899-7909)Online publication date: 3-Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media