research-article

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

Authors:

Xiaojiang Peng,

Yu QiaoAuthors Info & Claims

ICMI '19: 2019 International Conference on Multimodal Interaction

Pages 551 - 556

https://doi.org/10.1145/3340555.3355711

Published: 14 October 2019 Publication History

Abstract

This paper presents our approach for the engagement intensity regression task of EmotiW 2019. The task is to predict the engagement intensity value of a student when he or she is watching an online MOOCs video in various conditions. Based on our winner solution last year, we mainly explore head features and body features with a bootstrap strategy and two novel loss functions in this paper. We maintain the framework of multi-instance learning with long short-term memory (LSTM) network, and make three contributions. First, besides of the gaze and head pose features, we explore facial landmark features in our framework. Second, inspired by the fact that engagement intensity can be ranked in values, we design a rank loss as a regularization which enforces a distance margin between the features of distant category pairs and adjacent category pairs. Third, we use the classical bootstrap aggregation method to perform model ensemble which randomly samples a certain training data by several times and then averages the model predictions. We evaluate the performance of our method and discuss the influence of each part on the validation dataset. Our methods finally win 3rd place with MSE of 0.0626 on the testing set. https://github.com/kaiwang960112/EmotiW_2019_ engagement_regression

References

[1]

Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, 2016. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science(2016).

[2]

Nigel Bosch, Sidney K D’Mello, Ryan S Baker, Jaclyn Ocumpaugh, Valerie Shute, Matthew Ventura, Lubin Wang, and Weinan Zhao. 2016. Detecting Student Emotions in Computer-Enabled Classrooms. In IJCAI. 4125–4129.

[3]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[4]

Sidney K D’Mello, Scotty D Craig, and Art C Graesser. 2009. Multimethod assessment of affective experience and expression during deep learning. International Journal of Learning Technology 4, 3-4(2009), 165–187.

Digital Library

[5]

Sidney K D’Mello and Arthur Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Modeling and User-Adapted Interaction 20, 2 (2010), 147–187.

Digital Library

[6]

B. Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics 7, 1 (1979), 1–26.

[7]

Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research 74, 1 (2004), 59–109.

[8]

Benjamin S Goldberg, Robert A Sottilare, Keith W Brawner, and Heather K Holden. 2011. Predicting learner engagement during well-defined and ill-defined computer-based intercultural interactions. In International Conference on Affective Computing and Intelligent Interaction. Springer, 538–547.

[9]

Julie A Gray and Melanie DiLoreto. 2016. The effects of student engagement, student satisfaction, and perceived learning in online learning environments.International Journal of Educational Leadership Preparation 11, 1(2016), n1.

[10]

Da Guo, Kai Wang, Jianfei Yang, Kaipeng Zhang, Xiaojiang Peng, and Yu Qiao. 2019. Exploring Regularizations with Face, Body and Image Cues for Group Cohesion Prediction. In Proceedings of the 21th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[11]

E Joseph. 2005. Engagement tracing: using response times to model student disengagement. Artificial intelligence in education: Supporting learning through intelligent and socially informed technology 125 (2005), 88.

[12]

Amanjot Kaur, Aamir Mustafa, Love Mehta, and Abhinav Dhall. 2018. Prediction and localization of student engagement in the wild. In 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1–8.

[13]

Kenneth R Koedinger, John R Anderson, William H Hadley, and Mary A Mark. 1997. Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education (IJAIED) 8(1997), 30–43.

[14]

Sunan Li, Wenming Zheng, Yuan Zong, Cheng Lu, Chuangao Tang, Xingxun Jiang, Jiateng Liu, and Wanchuang Xia. 2019. Bi-modality Fusion for Emotion Recognition in the Wild. In Proceedings of the 21th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[15]

Zheng Li, Jianfei Yang, Juan Zha, Chang-Dong Wang, and Weishi Zheng. 2016. Online visual tracking via correlation filter with convolutional networks. In Visual Communications and Image Processing (VCIP), 2016. IEEE, 1–4.

[16]

Debin Meng, Xiaojiang Peng, Kai Wang, and Yu Qiao. 2019. frame attention networks for facial expression recognition in videos. arxiv:cs.CV/1907.00193

[17]

Aamir Mustafa, Amanjot Kaur, Love Mehta, and Abhinav Dhall. 2018. Prediction and Localization of Student Engagement in the Wild. arXiv preprint arXiv:1804.00858(2018).

[18]

Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, Shiguang Shan, Yan Huang, Songfan Yang, and Xilin Chen. 2018. Automatic engagement prediction with GAP feature. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 599–603.

Digital Library

[19]

Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2.

[20]

Lianzhi Tan, Kaipeng Zhang, Kai Wang, Xiaoxing Zeng, Xiaojiang Peng, and Yu Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549–552.

Digital Library

[21]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497.

Digital Library

[22]

Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, and Yu Qiao. 2018. Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (in press). ACM.

Digital Library

[23]

Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2019. Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. arXiv preprint arXiv:1905.04075(2019).

[24]

Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Qiao Yu. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition.

[25]

Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R Movellan. 2014. The faces of engagement: Automatic recognition of student engagement from facial expressions. IEEE Transactions on Affective Computing 5, 1 (2014), 86–98.

[26]

Xiang Xiao, Phuong Pham, and Jingtao Wang. 2017. Dynamics of affective states during mooc learning. In International Conference on Artificial Intelligence in Education. Springer, 586–589.

[27]

Jianfei Yang, Kai Wang, Xiaojiang Peng, and Yu Qiao. 2018. Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 594–598.

Digital Library

[28]

W. Yun, D. Lee, C. Park, J. Kim, and J. Kim. 2018. Automatic Recognition of Children Engagement from Facial Video using Convolutional Neural Networks. IEEE Transactions on Affective Computing(2018), 1–1. https://doi.org/10.1109/TAFFC.2018.2834350

Digital Library

Cited By

Somu RAshok Kumar P(2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
https://doi.org/10.1155/2024/8886197
Wu CLiu SHuang XWang XZhang RMinciullo LYiu WKwan KCheng K(2024)CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00466(4636-4645)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00466
Lee DKim YPicard RBreazeal CPark HElkind E(2023)MultiPar-TProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/433(3893-3901)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/433
Show More Cited By

Recommendations

Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction
ICMI '19: 2019 International Conference on Multimodal Interaction

This paper proposes a novel engagement intensity prediction approach, which is also applied in the EmotiW Challenge 2019 and resulted in good performance. The task is to predict the engagement level when a subject student is watching an educational video ...
Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different ...
Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '19: 2019 International Conference on Multimodal Interaction

October 2019

601 pages

ISBN:9781450368605

DOI:10.1145/3340555

Editors:
Wen Gao
Peking University, China
,
Helen Mei Ling Meng
Chinese University of Hong Kong, China
,
Matthew Turk
Toyota Technological Institute at Chicago, USA
,
Susan R. Fussell
Cornell University, USA
,
Björn Schuller
Imperial College London / University of Augsburg, UK
,
Yale Song
Microsoft Research, USA
,
Kai Yu
Shanghai Jiao Tong University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMI '19

ICMI '19: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 14 - 18, 2019

Suzhou, China

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
282
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Somu RAshok Kumar P(2024)Analysis of Learner’s Emotional Engagement in Online Learning Using Machine Learning Adam Robust Optimization AlgorithmScientific Programming10.1155/2024/88861972024:1Online publication date: 5-Jun-2024
https://doi.org/10.1155/2024/8886197
Wu CLiu SHuang XWang XZhang RMinciullo LYiu WKwan KCheng K(2024)CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00466(4636-4645)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00466
Lee DKim YPicard RBreazeal CPark HElkind E(2023)MultiPar-TProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/433(3893-3901)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/433
Qian JJiang XMa JLi JGao ZQin X(2023)Accompany Children's Learning for You: An Intelligent Companion Learning SystemComputer Graphics Forum10.1111/cgf.1486242:6Online publication date: 3-Jul-2023
https://doi.org/10.1111/cgf.14862
Hu HLiu JZhang X(2023)Multilayer self‐attention residual network for code searchConcurrency and Computation: Practice and Experience10.1002/cpe.765035:9Online publication date: 13-Feb-2023
https://doi.org/10.1002/cpe.7650
Savchenko ASavchenko LMakarov I(2022)Classifying Emotions and Engagement in Online Learning Based on a Single Facial Expression Recognition Neural NetworkIEEE Transactions on Affective Computing10.1109/TAFFC.2022.318839013:4(2132-2143)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TAFFC.2022.3188390
Buono PDe Carolis BD’Errico FMacchiarulo NPalestra G(2022)Assessing student engagement from facial behavior in on-line learningMultimedia Tools and Applications10.1007/s11042-022-14048-882:9(12859-12877)Online publication date: 24-Oct-2022
https://dl.acm.org/doi/10.1007/s11042-022-14048-8
Copur ONakıp MScardapane SSlowack J(2022)Engagement Detection with Multi-Task Training in E-Learning EnvironmentsImage Analysis and Processing – ICIAP 202210.1007/978-3-031-06433-3_35(411-422)Online publication date: 15-May-2022
https://doi.org/10.1007/978-3-031-06433-3_35
Ji SWang KPeng XYang JZeng ZQiao Y(2020)Multiple Transfer Learning and Multi-label Balanced Training Strategies for Facial AU Detection In the Wild2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW50498.2020.00215(1657-1661)Online publication date: Jun-2020
https://doi.org/10.1109/CVPRW50498.2020.00215
Wang KPeng XYang JLu SQiao Y(2020)Suppressing Uncertainties for Large-Scale Facial Expression Recognition2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR42600.2020.00693(6896-6905)Online publication date: Jun-2020
https://doi.org/10.1109/CVPR42600.2020.00693
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents