Abstract
Student expression recognition has become an essential tool for assessing learning experiences and emotional states. This paper introduces xLSTM-FER, a novel architecture derived from the Extended Long Short-Term Memory (xLSTM), designed to enhance the accuracy and efficiency of expression recognition through advanced sequence processing capabilities for student facial expression recognition. xLSTM-FER processes input images by segmenting them into a series of patches and leveraging a stack of xLSTM blocks to handle these patches. xLSTM-FER can capture subtle changes in real-world students’ facial expressions and improve recognition accuracy by learning spatial-temporal relationships within the sequence. Experiments on CK+, RAF-DF, and FERplus demonstrate the potential of xLSTM-FER in expression recognition tasks, showing better performance compared to state-of-the-art methods on standard datasets. The linear computational and memory complexity of xLSTM-FER make it particularly suitable for handling high-resolution images. Moreover, the design of xLSTM-FER allows for efficient processing of non-sequential inputs such as images without additional computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alkin, B., Beck, M., Pöppel, K., Hochreiter, S., Brandstetter, J.: Vision-LSTM: xLSTM as generic vision backbone. arXiv preprint arXiv:2406.04303 (2024)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Beck, M., et al.: xLSTM: extended long short-term memory. arXiv preprint arXiv:2405.04517 (2024)
Chen, Y., Wang, J., Chen, S., Shi, Z., Cai, J.: Facial motion prior networks for facial expression recognition. In: 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2019)
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, Y., Lam, J.C., Li, V.O.: Video-based emotion recognition using deeply-supervised neural networks. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 584–588 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Q., Chen, J.: Enhancing academic performance prediction with temporal graph networks for massive open online courses. J. Big Data 11(1), 52 (2024)
Huang, Q., Huang, C., Huang, J., Fujita, H.: Adaptive resource prefetching with spatial-temporal and topic information for educational cloud storage systems. Knowl.-Based Syst. 181, 104791 (2019)
Huang, Q., Huang, C., Wang, X., Jiang, F.: Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 580, 35–54 (2021)
Huang, Q., Zeng, Y.: Improving academic performance predictions with dual graph neural networks. Complex Intell. Syst. 1–19 (2024)
Jagadeesh, M., Baranidharan, B.: Facial expression recognition of online learners from real-time videos using a novel deep learning model. Multimedia Syst. 28(6), 2285–2305 (2022)
Jiang, F., et al.: Face2nodes: learning facial expression representations with relation-aware dynamic graph convolution networks. Inf. Sci. 649, 119640 (2023)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165. PMLR (2020)
Lasri, I., Solh, A.R., El Belkacemi, M.: Facial emotion recognition of students using convolutional neural network. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–6. IEEE (2019)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Ling, X., Liang, J., Wang, D., Yang, J.: A facial expression recognition system for smart learning based on yolo and vision transformer. In: Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, pp. 178–182 (2021)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101. IEEE (2010)
Mehta, S., Rastegari, M.: Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021)
Meng, D., Peng, X., Wang, K., Qiao, Y.: Frame attention networks for facial expression recognition in videos. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3866–3870. IEEE (2019)
Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 273–289. Springer (2020)
Ozdemir, M.A., Elagoz, B., Alaybeyoglu, A., Sadighzadeh, R., Akan, A.: Real time emotion recognition from facial expressions using CNN architecture. In: 2019 Medical Technologies Congress (TIPTEKNO), pp. 1–4. IEEE (2019)
Schlag, I., Irie, K., Schmidhuber, J.: Linear transformers are secretly fast weight programmers. In: International Conference on Machine Learning, pp. 9355–9366. PMLR (2021)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Tonguç, G., Ozkara, B.O.: Automatic recognition of student emotions from facial expressions during a lecture. Comput. Educ. 148, 103797 (2020)
Wang, J., Zhang, Z.: Facial expression recognition in online course using light-weight vision transformer via knowledge distillation. In: Pacific Rim International Conference on Artificial Intelligence, pp. 247–253. Springer (2023)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Wang, K., Cheng, M.: Teaching feedback system based on VIT expression recognition in distance education. In: 2024 13th International Conference on Educational and Information Technology (ICEIT), pp. 93–97. IEEE (2024)
Wu, X., et al.: FER-CHC: facial expression recognition with cross-hierarchy contrast. Appl. Soft Comput. 145, 110530 (2023)
Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: erasing attention consistency for noisy label facial expression recognition. In: European Conference on Computer Vision, pp. 418–434. Springer (2022)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Zhou, S., Wu, X., Jiang, F., Huang, Q., Huang, C.: Emotion recognition from large-scale video clips with cross-attention and hybrid feature weighting neural networks. Int. J. Environ. Res. Public Health 20(2), 1400 (2023)
Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X.: Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024)
Acknowledgement
The research project is supported by the National Natural Science Foundation of China (No. 62207028), partially by Zhejiang Provincial Natural Science Foundation (No. LY23F020009), and the Key R&D Program of Zhejiang Province (No. 2022C03106), and Scientific Research Fund of Zhejiang Provincial Education Department (No. 2023SCG367).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, Q., Chen, J. (2025). xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network. In: Zhang, W., Tung, A., Zheng, Z., Yang, Z., Wang, X., Guo, H. (eds) Web and Big Data. APWeb-WAIM 2024 International Workshops. APWeb-WAIM 2024. Communications in Computer and Information Science, vol 2246. Springer, Singapore. https://doi.org/10.1007/978-981-96-0055-7_21
Download citation
DOI: https://doi.org/10.1007/978-981-96-0055-7_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0054-0
Online ISBN: 978-981-96-0055-7
eBook Packages: Computer ScienceComputer Science (R0)