research-article

Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge

Author:

Albert C. CruzAuthors Info & Claims

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 511 - 518

https://doi.org/10.1145/2818346.2830592

Published: 09 November 2015 Publication History

Abstract

The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.

References

[1]

A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon, "Video and image based emotion recognition challenges in the wild: Emotiw 2015," in ACM ICMI Workshops, 2015.

Digital Library

[2]

M. J. Lyons, "Automatic classification of single facial images," IEEE Trans. PAMI, vol. 21, no. 12, pp. 1357--1362, 1999.

Digital Library

[3]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "Collecting large, richly annotated facial-expression databases from movies," IEEE Multimedia, vol. 19, no. 3, pp. 34--41, 2012.

Digital Library

[4]

M. F. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer, "Meta-analysis of the first facial expression recognition challenge," IEEE Trans. SMC, Part B: Cybernetics, vol. 42, no. 4, pp. 966--979, 2012.

Digital Library

[5]

B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic, "AVEC 2011 - The first international audio/visual emotion challenge," in Affective Computing and Intelligent Interaction Workshops, vol. 6975, pp. 415--424, 2011.

Digital Library

[6]

R. Edgar-Hunt, J. Marland, and S. Rawle, The Language of Film. Fairchild Books, 2005.

[7]

M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen, "Partial least squares regression on grassmannian manifold for emotion recognition," in ACM ICMI Workshops, pp. 525--530, 2013.

Digital Library

[8]

S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio, "Combining modality specific deep neural networks for emotion recognition in video," in ACM ICMI Workshops, pp. 543--550, 2013.

Digital Library

[9]

K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett, "Multiple kernel learning for emotion recognition in the wild," in ACM ICMI Workshops, pp. 517--524, 2013.

Digital Library

[10]

M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen, "Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild," in ACM ICMI Workshops, pp. 494--501, 2014.

Digital Library

[11]

J. Chen, Z. Chen, Z. Chi, and H. Fu, "Emotion recognition in the wild with feature fusion and multiple kernel learning," in ACM ICMI Workshops, pp. 508--513, 2014.

Digital Library

[12]

B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu, "Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild," in ACM ICMI Workshops, pp. 481--486, 2014.

Digital Library

[13]

F. Eyben, F. Weninger, F. Gross, and B. Schuller, "Recent developments in openSMILE, the munich open-source multimedia feature extractor," in ACM Multimedia, pp. 835--838, 2013.

Digital Library

[14]

L. Canini, S. Benini, P. Migliorati, and R. Leonardi, "Emotional identity of movies," in IEEE Int'l. Conf. Image Processing, pp. 1821--1824, 2009.

Digital Library

[15]

M. Kächele and M. Schels, "Inferring depression and affect from application dependent meta knowledge," in ACM Multimedia Workshops, pp. 41--48, 2014.

Digital Library

[16]

Y.-L. Shue, P. Keating, C. Vicenik, and K. Yu, "Voicesauce: A program for voice analysis," in Int'l. Congress of Phonetic Sciences, pp. 1846--1849, 2011.

[17]

D. Chen and C. D. Manning, "A fast and accurate dependency parser using neural networks," in Conf. Empirical Methods on Natural Language Processing, 2014.

[18]

A. C. Cruz, B. Bhanu, and N. Thakoor, "Facial emotion recognition with expression energy," in ACM ICMI Workshops, pp. 457--464, 2012.

Digital Library

[19]

E. H. Land, "The retinex theory of color vision," in Edwin H. Land's Essays (M. McCann, ed.), pp. 53--60, Society for Imaging Science & Technology, 1993.

[20]

J. Weijer, T. Gevers, and A. Gijsenij, "Edge-Based Color Constancy," IEEE Trans. on Image Processing, vol. 16, no. 9, pp. 2207--2214, 2007.

Digital Library

[21]

J. W. Davis, "Recognizing Movement using Motion Histograms," Tech. Rep. 487, MIT, 1998.

[22]

H. Meng, D. Huang, H. Wang, H. Yang, M. AI-Shuraifi, and Y. Wang, "Depression Recognition Based on Dynamic Facial and Vocal Expression Features Using Partial Least Square Regression," in ACM ICMI Workshops, pp. 21--30, 2013.

Digital Library

[23]

E. Kasutani and A. Yamada, "The mpeg-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval," in IEEE Int'l. Conf. Image Processing, vol. 1, pp. 674--677 vol.1, 2001.

[24]

G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions," IEEE Trans. PAMI, vol. 29, no. 6, pp. 1--14, 2007.

Digital Library

[25]

A. Cruz, B. Bhanu, and N. Thakoor, "One shot emotion scores for facial emotion recognition," in IEEE Int'l. Conf. Image Processing, pp. 1376--1380, Oct 2014.

[26]

S. Yang and B. Bhanu, "Understanding discrete facial expressions in video using an emotion avatar image," IEEE Trans. SMC, Part B: Cybernetics, vol. 42, no. 4, pp. 920--992, 2011.

Digital Library

[27]

C. Liu, J. Yuen, and A. Torralba, "SIFT flow: dense correspondence across scenes and its applications," IEEE Trans. PAMI, vol. 33, no. 5, pp. 978--994, 2011.

Digital Library

[28]

C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intelligent Systems and Technology, vol. 2, pp. 27:1--27:27, 2011.

Digital Library

[29]

X. Zhou and B. Bhanu, "Integrating face and gait for human recognition at a distance in video," IEEE Trans. SMC, Part B: Cybernetics, vol. 37, no. 5, pp. 1119--1137, 2007.

Digital Library

Cited By

Wang SWang WZhao JChen SJin QZhang SQin YLank EVinciarelli AHoggan ESubramanian SBrewster S(2017)Emotion recognition with multimodal features and temporal modelsProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3143016(598-602)Online publication date: 3-Nov-2017
https://dl.acm.org/doi/10.1145/3136755.3143016
Yan JYan BLu GXu QLi HCheng XCai X(2017)Convolutional neural networks and feature fusion for bimodal emotion recognition on the emotiW 2016 challenge2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301997(1-5)Online publication date: Oct-2017
https://doi.org/10.1109/CISP-BMEI.2017.8301997
Yan R(2017)Fast parallel hough transform linear features extracting method based on graphics processing units2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301969(1-5)Online publication date: Oct-2017
https://doi.org/10.1109/CISP-BMEI.2017.8301969
Show More Cited By

Index Terms

Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge
1. Computing methodologies

Recommendations

EmotiW 2016: video and group-level emotion recognition challenges
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

This paper discusses the baseline for the Emotion Recognition in the Wild (EmotiW) 2016 challenge. Continuing on the theme of automatic affect recognition `in the wild', the EmotiW challenge 2016 consists of two sub-challenges: an audio-video based ...
Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

The Emotion Recognition In The Wild Challenge and Workshop (EmotiW) 2013 Grand Challenge consists of an audio-video based emotion classification challenge, which mimics real-world conditions. In total, 27 teams participated in the challenge. The ...
From individual to group-level emotion recognition: EmotiW 5.0
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Research in automatic affect recognition has come a long way. This paper describes the fifth Emotion Recognition in the Wild (EmotiW) challenge 2017. EmotiW aims at providing a common benchmarking platform for researchers working on different aspects ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

November 2015

678 pages

ISBN:9781450339124

DOI:10.1145/2818346

General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '15

Sponsor:

SIGCHI

ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 9 - 13, 2015

Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang SWang WZhao JChen SJin QZhang SQin YLank EVinciarelli AHoggan ESubramanian SBrewster S(2017)Emotion recognition with multimodal features and temporal modelsProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3143016(598-602)Online publication date: 3-Nov-2017
https://dl.acm.org/doi/10.1145/3136755.3143016
Yan JYan BLu GXu QLi HCheng XCai X(2017)Convolutional neural networks and feature fusion for bimodal emotion recognition on the emotiW 2016 challenge2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301997(1-5)Online publication date: Oct-2017
https://doi.org/10.1109/CISP-BMEI.2017.8301997
Yan R(2017)Fast parallel hough transform linear features extracting method based on graphics processing units2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301969(1-5)Online publication date: Oct-2017
https://doi.org/10.1109/CISP-BMEI.2017.8301969
Cruz ARinaldi A(2017)Video Summarization for Expression Analysis of Motor Vehicle OperatorsUniversal Access in Human–Computer Interaction. Design and Development Approaches and Methods10.1007/978-3-319-58706-6_25(313-323)Online publication date: 16-May-2017
https://doi.org/10.1007/978-3-319-58706-6_25
Chen SLi XJin QZhang SQin YNakano YAndré ENishida TMorency LBusso CPelachaud C(2016)Video emotion recognition in the wild based on fusion of multimodal featuresProceedings of the 18th ACM International Conference on Multimodal Interaction10.1145/2993148.2997629(494-500)Online publication date: 31-Oct-2016
https://dl.acm.org/doi/10.1145/2993148.2997629

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents