Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2818346.2830592acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge

Published: 09 November 2015 Publication History

Abstract

The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.

References

[1]
A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon, "Video and image based emotion recognition challenges in the wild: Emotiw 2015," in ACM ICMI Workshops, 2015.
[2]
M. J. Lyons, "Automatic classification of single facial images," IEEE Trans. PAMI, vol. 21, no. 12, pp. 1357--1362, 1999.
[3]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "Collecting large, richly annotated facial-expression databases from movies," IEEE Multimedia, vol. 19, no. 3, pp. 34--41, 2012.
[4]
M. F. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer, "Meta-analysis of the first facial expression recognition challenge," IEEE Trans. SMC, Part B: Cybernetics, vol. 42, no. 4, pp. 966--979, 2012.
[5]
B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic, "AVEC 2011 - The first international audio/visual emotion challenge," in Affective Computing and Intelligent Interaction Workshops, vol. 6975, pp. 415--424, 2011.
[6]
R. Edgar-Hunt, J. Marland, and S. Rawle, The Language of Film. Fairchild Books, 2005.
[7]
M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen, "Partial least squares regression on grassmannian manifold for emotion recognition," in ACM ICMI Workshops, pp. 525--530, 2013.
[8]
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio, "Combining modality specific deep neural networks for emotion recognition in video," in ACM ICMI Workshops, pp. 543--550, 2013.
[9]
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett, "Multiple kernel learning for emotion recognition in the wild," in ACM ICMI Workshops, pp. 517--524, 2013.
[10]
M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen, "Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild," in ACM ICMI Workshops, pp. 494--501, 2014.
[11]
J. Chen, Z. Chen, Z. Chi, and H. Fu, "Emotion recognition in the wild with feature fusion and multiple kernel learning," in ACM ICMI Workshops, pp. 508--513, 2014.
[12]
B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu, "Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild," in ACM ICMI Workshops, pp. 481--486, 2014.
[13]
F. Eyben, F. Weninger, F. Gross, and B. Schuller, "Recent developments in openSMILE, the munich open-source multimedia feature extractor," in ACM Multimedia, pp. 835--838, 2013.
[14]
L. Canini, S. Benini, P. Migliorati, and R. Leonardi, "Emotional identity of movies," in IEEE Int'l. Conf. Image Processing, pp. 1821--1824, 2009.
[15]
M. Kächele and M. Schels, "Inferring depression and affect from application dependent meta knowledge," in ACM Multimedia Workshops, pp. 41--48, 2014.
[16]
Y.-L. Shue, P. Keating, C. Vicenik, and K. Yu, "Voicesauce: A program for voice analysis," in Int'l. Congress of Phonetic Sciences, pp. 1846--1849, 2011.
[17]
D. Chen and C. D. Manning, "A fast and accurate dependency parser using neural networks," in Conf. Empirical Methods on Natural Language Processing, 2014.
[18]
A. C. Cruz, B. Bhanu, and N. Thakoor, "Facial emotion recognition with expression energy," in ACM ICMI Workshops, pp. 457--464, 2012.
[19]
E. H. Land, "The retinex theory of color vision," in Edwin H. Land's Essays (M. McCann, ed.), pp. 53--60, Society for Imaging Science & Technology, 1993.
[20]
J. Weijer, T. Gevers, and A. Gijsenij, "Edge-Based Color Constancy," IEEE Trans. on Image Processing, vol. 16, no. 9, pp. 2207--2214, 2007.
[21]
J. W. Davis, "Recognizing Movement using Motion Histograms," Tech. Rep. 487, MIT, 1998.
[22]
H. Meng, D. Huang, H. Wang, H. Yang, M. AI-Shuraifi, and Y. Wang, "Depression Recognition Based on Dynamic Facial and Vocal Expression Features Using Partial Least Square Regression," in ACM ICMI Workshops, pp. 21--30, 2013.
[23]
E. Kasutani and A. Yamada, "The mpeg-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval," in IEEE Int'l. Conf. Image Processing, vol. 1, pp. 674--677 vol.1, 2001.
[24]
G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions," IEEE Trans. PAMI, vol. 29, no. 6, pp. 1--14, 2007.
[25]
A. Cruz, B. Bhanu, and N. Thakoor, "One shot emotion scores for facial emotion recognition," in IEEE Int'l. Conf. Image Processing, pp. 1376--1380, Oct 2014.
[26]
S. Yang and B. Bhanu, "Understanding discrete facial expressions in video using an emotion avatar image," IEEE Trans. SMC, Part B: Cybernetics, vol. 42, no. 4, pp. 920--992, 2011.
[27]
C. Liu, J. Yuen, and A. Torralba, "SIFT flow: dense correspondence across scenes and its applications," IEEE Trans. PAMI, vol. 33, no. 5, pp. 978--994, 2011.
[28]
C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intelligent Systems and Technology, vol. 2, pp. 27:1--27:27, 2011.
[29]
X. Zhou and B. Bhanu, "Integrating face and gait for human recognition at a distance in video," IEEE Trans. SMC, Part B: Cybernetics, vol. 37, no. 5, pp. 1119--1137, 2007.

Cited By

View all
  • (2017)Emotion recognition with multimodal features and temporal modelsProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3143016(598-602)Online publication date: 3-Nov-2017
  • (2017)Convolutional neural networks and feature fusion for bimodal emotion recognition on the emotiW 2016 challenge2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301997(1-5)Online publication date: Oct-2017
  • (2017)Fast parallel hough transform linear features extracting method based on graphics processing units2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301969(1-5)Online publication date: Oct-2017
  • Show More Cited By

Index Terms

  1. Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
        November 2015
        678 pages
        ISBN:9781450339124
        DOI:10.1145/2818346
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 November 2015

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. emotiw2015 challenge
        2. mise en scene
        3. syntax and semantics

        Qualifiers

        • Research-article

        Conference

        ICMI '15
        Sponsor:
        ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
        November 9 - 13, 2015
        Washington, Seattle, USA

        Acceptance Rates

        ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;
        Overall Acceptance Rate 453 of 1,080 submissions, 42%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 20 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2017)Emotion recognition with multimodal features and temporal modelsProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3143016(598-602)Online publication date: 3-Nov-2017
        • (2017)Convolutional neural networks and feature fusion for bimodal emotion recognition on the emotiW 2016 challenge2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301997(1-5)Online publication date: Oct-2017
        • (2017)Fast parallel hough transform linear features extracting method based on graphics processing units2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)10.1109/CISP-BMEI.2017.8301969(1-5)Online publication date: Oct-2017
        • (2017)Video Summarization for Expression Analysis of Motor Vehicle OperatorsUniversal Access in Human–Computer Interaction. Design and Development Approaches and Methods10.1007/978-3-319-58706-6_25(313-323)Online publication date: 16-May-2017
        • (2016)Video emotion recognition in the wild based on fusion of multimodal featuresProceedings of the 18th ACM International Conference on Multimodal Interaction10.1145/2993148.2997629(494-500)Online publication date: 31-Oct-2016

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media