Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3242969.3264981acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction

Published: 02 October 2018 Publication History

Abstract

This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different environments. Our approach formulates the prediction task as a multi-instance regression problem. We divide an input video sequence into segments and calculate the temporal and spatial features of each segment for regressing the intensity. Subject engagement, that is intuitively related with body and face changes in time domain, can be characterized by long short-term memory (LSTM) network. Hence, we build a multi-modal regression model based on multi-instance mechanism as well as LSTM. To make full use of training and validation data, we train different models for different data split and conduct model ensemble finally. Experimental results show that our method achieves mean squared error (MSE) of 0.0717 in the validation set, which improves the baseline results by 28%. Our methods finally win the challenge with MSE of 0.0626 on the testing set.

References

[1]
Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, et al. 2016. Openface: A general-purpose face recognition library with mobile applications. Carnegie Mellon University School of Computer Science (2016).
[2]
Nigel Bosch, Sidney K. D'Mello, Ryan S. Baker, Jaclyn Ocumpaugh, Valerie Shute, Matthew Ventura, Lubin Wang, and Weinan Zhao. 2016. Detecting Student Emotions in Computer-Enabled Classrooms. IJCAI. 4125--4129.
[3]
Maher Chaouachi, Pierre Chalfoun, Imène Jraidi, and Claude Frasson. 2010. Affect and mental engagement: towards adaptability for intelligent systems 23rd International FLAIRS Conference.
[4]
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.
[5]
Sidney K. D'Mello, Scotty D. Craig, and Art C. Graesser. 2009. Multimethod assessment of affective experience and expression during deep learning. International Journal of Learning Technology Vol. 4, 3-4 (2009), 165--187.
[6]
Sidney K. D'Mello and Arthur Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Modeling and User-Adapted Interaction Vol. 20, 2 (2010), 147--187.
[7]
Jennifer A. Fredricks, Phyllis C. Blumenfeld, and Alison H. Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research Vol. 74, 1 (2004), 59--109.
[8]
Benjamin S. Goldberg, Robert A. Sottilare, Keith W. Brawner, and Heather K. Holden. 2011. Predicting learner engagement during well-defined and ill-defined computer-based intercultural interactions. In International Conference on Affective Computing and Intelligent Interaction. Springer, 538--547.
[9]
E. Joseph. 2005. Engagement tracing: using response times to model student disengagement. Artificial intelligence in education: Supporting learning through intelligent and socially informed technology Vol. 125 (2005), 88.
[10]
Kenneth R. Koedinger, John R. Anderson, William H. Hadley, and Mary A. Mark. 1997. Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education (IJAIED) Vol. 8 (1997), 30--43.
[11]
Zheng Li, Jianfei Yang, Juan Zha, Chang-Dong Wang, and Weishi Zheng. 2016. Online visual tracking via correlation filter with convolutional networks Visual Communications and Image Processing (VCIP), 2016. IEEE, 1--4.
[12]
Aamir Mustafa, Amanjot Kaur, Love Mehta, and Abhinav Dhall. 2018. Prediction and Localization of Student Engagement in the Wild. arXiv preprint arXiv:1804.00858 (2018).
[13]
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.
[14]
Lianzhi Tan, Kaipeng Zhang, Kai Wang, Xiaoxing Zeng, Xiaojiang Peng, and Yu Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549--552.
[15]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks Proceedings of the IEEE international conference on computer vision. 4489--4497.
[16]
Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, and Yu Qiao. 2018. Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.
[17]
Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R. Movellan. 2014. The faces of engagement: Automatic recognition of student engagement from facial expressions. IEEE Transactions on Affective Computing Vol. 5, 1 (2014), 86--98.
[18]
Xiang Xiao, Phuong Pham, and Jingtao Wang. 2017. Dynamics of affective states during mooc learning. In International Conference on Artificial Intelligence in Education. Springer, 586--589.
[19]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, 10 (2016), 1499--1503.
[20]
Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence Vol. 29, 6 (2007), 915--928.

Cited By

View all
  • (2024)TVGeAN: Tensor Visibility Graph-Enhanced Attention Network for Versatile Multivariant Time Series Learning TasksMathematics10.3390/math1221332012:21(3320)Online publication date: 23-Oct-2024
  • (2024)Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysisPLOS ONE10.1371/journal.pone.029646819:1(e0296468)Online publication date: 2-Jan-2024
  • (2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
  • Show More Cited By

Index Terms

  1. Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
    October 2018
    687 pages
    ISBN:9781450356923
    DOI:10.1145/3242969
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. engagement intensity prediction
    2. multiple instance learning

    Qualifiers

    • Short-paper

    Funding Sources

    • Shenzhen Basic Research Program
    • International Partnership Program of Chinese Academy of Sciences
    • National Natural Science Foundation of China

    Conference

    ICMI '18
    Sponsor:
    • SIGCHI

    Acceptance Rates

    ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)TVGeAN: Tensor Visibility Graph-Enhanced Attention Network for Versatile Multivariant Time Series Learning TasksMathematics10.3390/math1221332012:21(3320)Online publication date: 23-Oct-2024
    • (2024)Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysisPLOS ONE10.1371/journal.pone.029646819:1(e0296468)Online publication date: 2-Jan-2024
    • (2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
    • (2024)Predicting Student Engagement Using Sequential Ensemble ModelIEEE Transactions on Learning Technologies10.1109/TLT.2023.334286017(939-950)Online publication date: 1-Jan-2024
    • (2024)A New Perspective on Stress Detection: An Automated Approach for Detecting Eustress and DistressIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332491015:3(1153-1165)Online publication date: 1-Jul-2024
    • (2024)STA-Net: Deep Spatial-Temporal Attention Network for Emotion Detection using EEG2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498977(142-149)Online publication date: 19-Jan-2024
    • (2024)Sequence-level affective level estimation based on pyramidal facial expression featuresPattern Recognition10.1016/j.patcog.2023.109958145(109958)Online publication date: Jan-2024
    • (2024)Leveraging part-and-sensitive attention network and transformer for learner engagement detectionAlexandria Engineering Journal10.1016/j.aej.2024.06.074107(198-204)Online publication date: Nov-2024
    • (2024)Class-attention video transformer for engagement predictionMultimedia Tools and Applications10.1007/s11042-024-20350-4Online publication date: 12-Oct-2024
    • (2024)Behavior Capture Based Explainable Engagement RecognitionPattern Recognition and Computer Vision10.1007/978-981-97-8792-0_17(239-253)Online publication date: 9-Nov-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media