Abstract
Student engagement is positively related to comprehension in teaching–learning process. Student engagement is widely studied in online learning environments, whereas this research focuses on student engagement recognition in classroom environments using visual cues. To incorporate learning-centered affective states, we curated a dataset with six learning-centered affective states from four public datasets. A graph convolution network (GCN)-based deep learning model with attention was designed and implemented to extract more contributing features from input video for student engagement recognition. The proposed architecture was evaluated on curated as well as four public datasets. An ablation study was conducted on a curated dataset, the best performing model with minority oversampling and focal cross-entropy loss achieved 65.35% accuracy. We also estimated the student engagement in authentic classroom data, and it showed a positive correlation between students’ engagement levels and post-lesson test scores with a Pearson’s coefficient value of 0.64. The proposed method outperformed the existing state-of-the-art methods on two of the public datasets with accuracy scores of 99.20% and 56.17%, and it achieved accuracy scores of 64.92% and 56.17% on other two public datasets which are better than many baseline results on them.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available from the corresponding author of respective public dataset on reasonable request.
References
Sümer, Ö, Goldberg P, D’Mello S, Gerjets P, Trautwein U, Kasneci E (2021) Multimodal engagement analysis from facial videos in the classroom. IEEE Trans Affect Comput pp. 1–1
Christenson S, Reschly AL, Wylie C et al. (2012) Handbook of research on student engagement, vol. 840. Springer
Fredricks JA, Blumenfeld PC, Paris AH (2004) School engagement: potential of the concept, state of the evidence. Rev Edu Res 74(1):59–109
Lei H, Cui Y, Zhou W (2018) Relationships between student engagement and academic achievement: a meta-analysis. Soc Behav Personal Int J 46(3):517–528
Janosz M (2012) Part iv commentary: Outcomes of engagement and engagement as an outcome: Some consensus, divergences, and unanswered questions. In: Handbook of research on student engagement, pp. 695–703, Springer
Whitehill J, Serpell Z, Lin Y-C, Foster A, Movellan JR (2014) The faces of engagement: automatic recognition of student engagementfrom facial expressions. IEEE Trans Affect Comput 5(1):86–98
Eisele G, Vachon H, Lafit G, Kuppens P, Houben M, Myin-Germeys I, Viechtbauer W (2022) The effects of sampling frequency and questionnaire length on perceived burden, compliance, and careless responding in experience sampling data in a student population. Assessment 29(2):136–151
Van de Grift WJ, Chun S, Maulana R, Lee O, Helms-Lorenz M (2017) Measuring teaching quality and student engagement in South Korea and The Netherlands. School Effect Sch Improv 28(3):337–349
D’Mello S, Picard RW, Graesser A (2007) Toward an affect-sensitive autotutor. IEEE Intell Syst 22(4):53–61
Cerezo, R., Sánchez-Santillán, M., Paule-Ruiz, M. P., Nú nez, J. C.: (2016)“Students’ lms interaction patterns and their relationship with achievement: A case study in higher education.Computers & Education, 96: 42–54
Bosch, N., D’mello, S. K., Ocumpaugh, J., Baker, R. S., Shute, V.: “Using video to automatically detect learner affect in computer-enabled classrooms,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 6, no. 2, pp. 1–26, (2016)
McNeal KS, Zhong M, Soltis NA, Doukopoulos L, Johnson ET, Courtney S, Alwan A, Porch M (2020) Biosensors show promise as a measure of student engagement in a large introductory biology course.CBE-Life Sciences Education 19(4): ar50
Bevilacqua D, Davidesco I, Wan L, Chaloner K, Rowland J, Ding M, Poeppel D, Dikker S (2019) Brain-to-brain synchrony and learning outcomes vary by student-teacher dynamics: evidence from a real-world classroom electroencephalography study. J Cogn Neurosci 31(3):401–411
Darnell DK, Krieg PA (2019) Student engagement, assessed using heart rate, shows no reset following active learning sessions in lectures. PloS one 14(12):e0225709
Baker RS, Ocumpaugh J (2014) Interaction-based affect detection in educational software.The Oxford handbook of affective computing, p. 233
Cocea M, Weibelzahl S (2010) Disengagement detection in online learning: validation studies and perspectives. IEEE Trans Learn Technol 4(2):114–124
Aluja-Banet T, Sancho M-R, Vukic I (2019) Measuring motivation from the virtual learning environment in secondary education. J Comput Sci 36:100629
Monkaresi H, Bosch N, Calvo RA, D’Mello SK (2016) Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans Affect Comput 8(1):15–28
Fairclough SH, Venables L (2006) Prediction of subjective states from psychophysiology: a multivariate approach. Biolog Psychol 71(1):100–110
Khedher AB, Jraidi I, Frasson C et al (2019) Tracking students’ mental engagement using eeg signals during an interaction with a virtual learning environment. J Intell Learn Syst Appl 11(01):1
Liao J, Liang Y, Pan J (2021) Deep facial spatiotemporal network for engagement prediction in online learning. Appl Intell 51(10):6609–6621
Bhardwaj P, Gupta P, Panwar H, Siddiqui MK, Morales-Menendez R, Bhaik A (2021) Application of deep learning on student engagement in e-learning environments. Comput Electr Eng 93:107277
Pabba C, Kumar P (2022) An intelligent system for monitoring students’ engagement in large classroom teaching through facial expression recognition. Expert Syst 39(1):e12839
Schuller B (2015) Deep learning our everyday emotions. Adv Neural Netw Comput Theor Issues, pp. 339–346
Kratzwald B, Ilić S, Kraus M, Feuerriegel S, Prendinger H (2018) Deep learning for affective computing: text-based emotion recognition in decision support. Decis Support Syst 115:24–35
Zhao S, Wang S, Soleymani M, Joshi D, Ji Q (2019) Affective computing for large-scale heterogeneous multimedia data: a survey. ACM Trans Multim Comput Commun Appl (TOMM) 15(3s): 1–32
Rouast PV, Adam MT, Chiong R (2019) Deep learning for human affect recognition: insights and new developments. IEEE Trans Affect Comput 12(2):524–543
Chen X, Xie H, Zou D, Hwang G-J (2020) Application and theory gaps during the rise of artificial intelligence in education. Comput Edu Art Intell 1:100002
Ouyang F, Jiao P (2021) Artificial intelligence in education: the three paradigms. Comput Edu Art Intell 2:100020
Bidwell J, Fuchs H (2011) Classroom analytics: measuring student engagement with automated gaze tracking. Behav Res Meth 49:113
Raca M, Dillenbourg P (2013) System for assessing classroom attention. In:Proceedings of the Third International Conference on Learning Analytics and Knowledge, pp. 265–269
Raca M (2015) Camera-based estimation of student’s attention in class. Tech. rep., EPFL
Zaletelj J, Košir A (2017) Predicting students’ attention in the classroom from kinect facial and body features. EURASIP J Imag Video Process 2017(1):1–12
Zaletelj J (2015) Estimation of students’ attention in the classroom from kinect features. In:Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis, pp. 220–224, IEEE
Thomas C, Jayagopi DB (2017) Predicting student engagement in classrooms using facial behavioral cues. In:Proceedings of the 1st ACM SIGCHI international workshop on multimodal interaction for education, pp. 33–40
Goldberg P, Sümer Ö, Stürmer K, Wagner W, Göllner R, Gerjets P, Kasneci E, Trautwein U (2021) Attentive or not? toward a machine learning approach to assessing students’ visible engagement in classroom instruction. Edu Psychol Rev 33(1):27–49
Fujii K, Marian P, Clark D, Okamoto Y, Rekimoto J (2018) Sync class: visualization system for in-class student synchronization. In: Proceedings of the 9th Augmented Human International Conference
Ngoc Anh B, Tung Son N, Truong Lam P, Phuong Chi L, Huu Tuan N, Cong Dat N, Huu Trung N, Umar Aftab M, Van Dinh T (2019) A computer-vision based application for student behavior monitoring in classroom. Appl Sci 9(22):4729
Ahuja K, Kim D, Xhakaj F, Varga V, Xie A, Zhang S, Townsend JE, Harrison C, Ogan A, Agarwal Y (2019) Edusense: practical classroom sensing at scale. Proc ACM Interact Mob Wearab Ubiquit Technol 3(3):1–26
Aslan S, Alyuz N, Tanriover C, Mete SE, Okur E, D’Mello SK, Arslan Esme A (2019) Investigating the impact of a real-time, multimodal student engagement analytics technology in authentic classrooms. In:Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–12
Stewart, A., Bosch, N., Chen, H., Donnelly, P., D’Mello, S.: “Face forward: Detecting mind wandering from video during narrative film comprehension,” in International Conference on Artificial Intelligence in Education, pp. 359–370, Springer, (2017)
Stewart, A., Bosch, N., D’Mello, S. K.: “Generalizability of face-based mind wandering detection across task contexts.,” International Educational Data Mining Society, (2017)
Bosch N, D’mello S. K (2019) Automatic detection of mind wandering from video in the lab and in the classroom. IEEE Transactions on Affective Computing 12(4):974–988
Slavin RE (1983) When does cooperative learning increase student achievement? Psychological bulletin 94(3):429
O’Donnell, A. M.: “The role of peers and group learning.,” (2006)
Tölgyessy M, Dekan M, Chovanec L, Hubinskỳ P (2021) Evaluation of the azure kinect and its comparison to kinect v1 and kinect v2. Sensors 21(2):413
Baltrusaitis, T., Zadeh, A., Lim, Y. C., Morency, L.-P.: “Openface 2.0: Facial behavior analysis toolkit,” in 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 59–66, IEEE, (2018)
Chi MT, Wylie R (2014) The icap framework: Linking cognitive engagement to active learning outcomes. Educational psychologist 49(4):219–243
Lewis, D. D., Catlett, J.: “Heterogeneous uncertainty sampling for supervised learning,” in Machine learning proceedings 1994, pp. 148–156, Elsevier, (1994)
Ocumpaugh, J.: “Baker rodrigo ocumpaugh monitoring protocol (bromp) 2.0 technical and training manual,” New York, NY and Manila, Philippines: Teachers College, Columbia University and Ateneo Laboratory for the Learning Sciences, vol. 60, (2015)
Alyuz, N., Okur, E., Oktay, E., Genc, U., Aslan, S., Mete, S. E., Arnrich, B., Esme, A. A.: “Semi-supervised model personalization for improved detection of learner’s emotional engagement,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 100–107, (2016)
Okur, E., Alyuz, N., Aslan, S., Genc, U., Tanriover, C., Arslan Esme, A.: “Behavioral engagement detection of students in the wild,” in International Conference on Artificial Intelligence in Education, pp. 250–261, Springer, (2017)
Smallwood J, Schooler JW (2006) The restless mind. Psychological bulletin 132(6):946
D’Mello, S. K.: “What do we think about when we learn?,” in Deep comprehension, pp. 52–67, Routledge, (2018)
Hutt S, Krasich K, Mills C, Bosch N, White S, Brockmole JR, D’Mello SK (2019) Automated gaze-based mind wandering detection during computerized learning in classrooms. User Modeling and User-Adapted Interaction 29(4):821–867
Blanchard, N., Bixler, R., Joyce, T., D’Mello, S.: “Automated physiological-based detection of mind wandering during learning,” in International conference on intelligent tutoring systems, pp. 55–60, Springer, (2014)
Ekman P (1992) An argument for basic emotions. Cognition & emotion 6(3–4):169–200
Pekrun, R.: “A social-cognitive, control-value theory of achievement emotions.,” (2000)
D’Mello SK, Lehman B, Person N (2010) Monitoring affect states during effortful problem solving activities. International Journal of Artificial Intelligence in Education 20(4):361–389
Sabourin JL, Lester JC (2014) Affect and engagement in game-basedlearning environments. IEEE Transactions on Affective Computing 5(1):45–56
Ashwin T, Guddeti RMR (2020) Affective database for e-learning and classroom environments using indian students faces, hand gestures and body postures. Future Generation Computer Systems 108:334–348
Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.: “Daisee: Towards user engagement recognition in the wild,” arXiv preprint arXiv:1609.01885, (2016)
Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., Hariri, B.: “Yawdd: A yawning detection dataset,” in Proceedings of the 5th ACM multimedia systems conference, pp. 24–28, (2014)
Zhalehpour S, Onder O, Akhtar Z, Erdem CE (2016) Baum-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing 8(3):300–313
Ghoddoosian, R., Galib, M., Athitsos, V.: “A realistic dataset and baseline temporal model for early drowsiness detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, (2019)
Kipf, T. N., Welling, M.: “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, (2016)
Liu, D., Zhang, H., Zhou, P.: “Video-based facial expression recognition using graph convolutional networks,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 607–614, (2021)
Liu G, Guo J (2019) Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338
Luong, M.-T., Pham, H., Manning, C. D.: “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, (2015)
Loshchilov, I., Hutter, F.: “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, (2017)
Omidyeganeh M, Shirmohammadi S, Abtahi S, Khurshid A, Farhan M, Scharcanski J, Hariri B, Laroche D, Martel L (2016) Yawning detection using embedded smart cameras. IEEE Transactions on Instrumentation and Measurement 65(3):570–582
Zhang, W., Su, J.: “Driver yawning detection based on long short term memory networks,” in 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–5, IEEE, (2017)
Zhang, W., Murphey, Y. L., Wang, T., Xu, Q.: “Driver yawning detection based on deep convolutional neural learning and robust nose tracking,” in 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, (2015)
Bai, J., Yu, W., Xiao, Z., Havyarimana, V., Regan, A. C., Jiang, H., Jiao, L.: “Two-stream spatial-temporal graph convolutional networks for driver drowsiness detection,” IEEE Transactions on Cybernetics, (2021)
Deng W, Wu R (2019) Real-time driver-drowsiness detection system using facial features. IEEE Access 7:118727–118738
Ji Y, Wang S, Zhao Y, Wei J, Lu Y (2019) Fatigue state detection based on multi-index fusion and state recognition network. IEEE Access 7:64136–64147
Ye, M., Zhang, W., Cao, P., Liu, K.: “Driver fatigue detection based on residual channel attention network and head pose estimation,” Applied Sciences, vol. 11, no. 19, (2021)
Xiang W, Wu X, Li C, Zhang W, Li F (2022) Driving fatigue detection based on the combination of multi-branch 3d-cnn and attention mechanism. Applied Sciences 12(9):4689
Zhang S, Zhang S, Huang T, Gao W, Tian Q (2017) Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Transactions on Circuits and Systems for Video Technology 28(10):3030–3043
Ma Y, Hao Y, Chen M, Chen J, Lu P, Košir A (2019) Audio-visual emotion fusion (avef): A deep efficient weighted approach. Information Fusion 46:184–192
Pan, B., Hirota, K., Jia, Z., Zhao, L., Jin, X., Dai, Y.: “Multimodal emotion recognition based on feature selection and extreme learning machine in video clips,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–15, (2021)
Mehta, N. K., Prasad, S. S., Saurav, S., Saini, R., Singh, S.: “Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement,” Applied Intelligence, pp. 1–21, (2022)
Yang, J., Wang, K., Peng, X., Qiao, Y.: “Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction,” in Proceedings of the 20th ACM international conference on multimodal interaction, pp. 594–598, (2018)
Huang, T., Mei, Y., Zhang, H., Liu, S., Yang, H.: “Fine-grained engagement recognition in online learning environment,” in 2019 IEEE 9th international conference on electronics information and emergency communication (ICEIEC), pp. 338–341, IEEE, (2019)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, (2017)
Schroff, F., Kalenichenko, D., Philbin, J.: “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823, (2015)
Author information
Authors and Affiliations
Contributions
Conceptualization: SM, KS; Methodology: SM, KS; Formal analysis and investigation: SM, KS, RM; Writing—original draft preparation: SM; Writing—review and editing: KS, RM; Supervision: KS, RM.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mandia, S., Singh, K. & Mitharwal, R. Recognition of student engagement in classroom from affective states. Int J Multimed Info Retr 12, 18 (2023). https://doi.org/10.1007/s13735-023-00284-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13735-023-00284-7