short-paper

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction

Authors:

Xiaojiang Peng,

Yu QiaoAuthors Info & Claims

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 594 - 598

https://doi.org/10.1145/3242969.3264981

Published: 02 October 2018 Publication History

Abstract

This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different environments. Our approach formulates the prediction task as a multi-instance regression problem. We divide an input video sequence into segments and calculate the temporal and spatial features of each segment for regressing the intensity. Subject engagement, that is intuitively related with body and face changes in time domain, can be characterized by long short-term memory (LSTM) network. Hence, we build a multi-modal regression model based on multi-instance mechanism as well as LSTM. To make full use of training and validation data, we train different models for different data split and conduct model ensemble finally. Experimental results show that our method achieves mean squared error (MSE) of 0.0717 in the validation set, which improves the baseline results by 28%. Our methods finally win the challenge with MSE of 0.0626 on the testing set.

References

[1]

Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, et al. 2016. Openface: A general-purpose face recognition library with mobile applications. Carnegie Mellon University School of Computer Science (2016).

[2]

Nigel Bosch, Sidney K. D'Mello, Ryan S. Baker, Jaclyn Ocumpaugh, Valerie Shute, Matthew Ventura, Lubin Wang, and Weinan Zhao. 2016. Detecting Student Emotions in Computer-Enabled Classrooms. IJCAI. 4125--4129.

Digital Library

[3]

Maher Chaouachi, Pierre Chalfoun, Imène Jraidi, and Claude Frasson. 2010. Affect and mental engagement: towards adaptability for intelligent systems 23rd International FLAIRS Conference.

[4]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[5]

Sidney K. D'Mello, Scotty D. Craig, and Art C. Graesser. 2009. Multimethod assessment of affective experience and expression during deep learning. International Journal of Learning Technology Vol. 4, 3-4 (2009), 165--187.

Digital Library

[6]

Sidney K. D'Mello and Arthur Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Modeling and User-Adapted Interaction Vol. 20, 2 (2010), 147--187.

Digital Library

[7]

Jennifer A. Fredricks, Phyllis C. Blumenfeld, and Alison H. Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research Vol. 74, 1 (2004), 59--109.

[8]

Benjamin S. Goldberg, Robert A. Sottilare, Keith W. Brawner, and Heather K. Holden. 2011. Predicting learner engagement during well-defined and ill-defined computer-based intercultural interactions. In International Conference on Affective Computing and Intelligent Interaction. Springer, 538--547.

Digital Library

[9]

E. Joseph. 2005. Engagement tracing: using response times to model student disengagement. Artificial intelligence in education: Supporting learning through intelligent and socially informed technology Vol. 125 (2005), 88.

[10]

Kenneth R. Koedinger, John R. Anderson, William H. Hadley, and Mary A. Mark. 1997. Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education (IJAIED) Vol. 8 (1997), 30--43.

[11]

Zheng Li, Jianfei Yang, Juan Zha, Chang-Dong Wang, and Weishi Zheng. 2016. Online visual tracking via correlation filter with convolutional networks Visual Communications and Image Processing (VCIP), 2016. IEEE, 1--4.

[12]

Aamir Mustafa, Amanjot Kaur, Love Mehta, and Abhinav Dhall. 2018. Prediction and Localization of Student Engagement in the Wild. arXiv preprint arXiv:1804.00858 (2018).

[13]

Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.

[14]

Lianzhi Tan, Kaipeng Zhang, Kai Wang, Xiaoxing Zeng, Xiaojiang Peng, and Yu Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549--552.

Digital Library

[15]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks Proceedings of the IEEE international conference on computer vision. 4489--4497.

Digital Library

[16]

Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, and Yu Qiao. 2018. Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[17]

Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R. Movellan. 2014. The faces of engagement: Automatic recognition of student engagement from facial expressions. IEEE Transactions on Affective Computing Vol. 5, 1 (2014), 86--98.

[18]

Xiang Xiao, Phuong Pham, and Jingtao Wang. 2017. Dynamics of affective states during mooc learning. In International Conference on Artificial Intelligence in Education. Springer, 586--589.

[19]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, 10 (2016), 1499--1503.

[20]

Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence Vol. 29, 6 (2007), 915--928.

Digital Library

Cited By

Baz M(2024)TVGeAN: Tensor Visibility Graph-Enhanced Attention Network for Versatile Multivariant Time Series Learning TasksMathematics10.3390/math1221332012:21(3320)Online publication date: 23-Oct-2024
https://doi.org/10.3390/math12213320
Awada MBecerik Gerber BLucas GRoll S(2024)Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysisPLOS ONE10.1371/journal.pone.029646819:1(e0296468)Online publication date: 2-Jan-2024
https://doi.org/10.1371/journal.pone.0296468
Somu RAshok Kumar P(2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
https://doi.org/10.1155/2024/8886197
Show More Cited By

Index Terms

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Multiple instance learning with bag dissimilarities

Multiple instance learning (MIL) is concerned with learning from sets (bags) of objects (instances), where the individual instance labels are ambiguous. In this setting, supervised learning cannot be applied directly. Often, specialized MIL methods ...
An efficient parallel neural network-based multi-instance learning algorithm

Multiple instance learning (MIL) has been studied actively in recent years. However, it is facing a computational challenge due to the large scale of data volume. Parallel computing is a good way of overcoming the computational challenge. In this paper, ...
Multiple instance learning

The characteristics specific of MIL problems are formally identified and described.MIL methods and applications are reviewed in the light of the problem characteristics.Comparative experiments show the impact of problem characteristics on 16 reference ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

October 2018

687 pages

ISBN:9781450356923

DOI:10.1145/3242969

General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Shenzhen Basic Research Program
International Partnership Program of Chinese Academy of Sciences
National Natural Science Foundation of China

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16 - 20, 2018

CO, Boulder, USA

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
749
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Baz M(2024)TVGeAN: Tensor Visibility Graph-Enhanced Attention Network for Versatile Multivariant Time Series Learning TasksMathematics10.3390/math1221332012:21(3320)Online publication date: 23-Oct-2024
https://doi.org/10.3390/math12213320
Awada MBecerik Gerber BLucas GRoll S(2024)Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysisPLOS ONE10.1371/journal.pone.029646819:1(e0296468)Online publication date: 2-Jan-2024
https://doi.org/10.1371/journal.pone.0296468
Somu RAshok Kumar P(2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
https://doi.org/10.1155/2024/8886197
Tian XNunes BLiu YManrique R(2024)Predicting Student Engagement Using Sequential Ensemble ModelIEEE Transactions on Learning Technologies10.1109/TLT.2023.334286017(939-950)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TLT.2023.3342860
Awada MBecerik-Gerber BLucas GRoll SLiu R(2024)A New Perspective on Stress Detection: An Automated Approach for Detecting Eustress and DistressIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332491015:3(1153-1165)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TAFFC.2023.3324910
Gan P(2024)STA-Net: Deep Spatial-Temporal Attention Network for Emotion Detection using EEG2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498977(142-149)Online publication date: 19-Jan-2024
https://doi.org/10.1109/NNICE61279.2024.10498977
Liao JHao YZhou ZPan JLiang Y(2024)Sequence-level affective level estimation based on pyramidal facial expression featuresPattern Recognition10.1016/j.patcog.2023.109958145(109958)Online publication date: Jan-2024
https://doi.org/10.1016/j.patcog.2023.109958
Su RHe LLuo M(2024)Leveraging part-and-sensitive attention network and transformer for learner engagement detectionAlexandria Engineering Journal10.1016/j.aej.2024.06.074107(198-204)Online publication date: Nov-2024
https://doi.org/10.1016/j.aej.2024.06.074
Ai XSheng VLi CYang HCui Z(2024)Class-attention video transformer for engagement predictionMultimedia Tools and Applications10.1007/s11042-024-20350-4Online publication date: 12-Oct-2024
https://doi.org/10.1007/s11042-024-20350-4
Bei YGuo SGao KFeng ZTong YCai WCheng LXue L(2024)Behavior Capture Based Explainable Engagement RecognitionPattern Recognition and Computer Vision10.1007/978-981-97-8792-0_17(239-253)Online publication date: 9-Nov-2024
https://doi.org/10.1007/978-981-97-8792-0_17
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents