Nothing Special   »   [go: up one dir, main page]

Skip to main content

xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network

  • Conference paper
  • First Online:
Web and Big Data. APWeb-WAIM 2024 International Workshops (APWeb-WAIM 2024)

Abstract

Student expression recognition has become an essential tool for assessing learning experiences and emotional states. This paper introduces xLSTM-FER, a novel architecture derived from the Extended Long Short-Term Memory (xLSTM), designed to enhance the accuracy and efficiency of expression recognition through advanced sequence processing capabilities for student facial expression recognition. xLSTM-FER processes input images by segmenting them into a series of patches and leveraging a stack of xLSTM blocks to handle these patches. xLSTM-FER can capture subtle changes in real-world students’ facial expressions and improve recognition accuracy by learning spatial-temporal relationships within the sequence. Experiments on CK+, RAF-DF, and FERplus demonstrate the potential of xLSTM-FER in expression recognition tasks, showing better performance compared to state-of-the-art methods on standard datasets. The linear computational and memory complexity of xLSTM-FER make it particularly suitable for handling high-resolution images. Moreover, the design of xLSTM-FER allows for efficient processing of non-sequential inputs such as images without additional computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alkin, B., Beck, M., Pöppel, K., Hochreiter, S., Brandstetter, J.: Vision-LSTM: xLSTM as generic vision backbone. arXiv preprint arXiv:2406.04303 (2024)

  2. Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)

    Google Scholar 

  3. Beck, M., et al.: xLSTM: extended long short-term memory. arXiv preprint arXiv:2405.04517 (2024)

  4. Chen, Y., Wang, J., Chen, S., Shi, Z., Cai, J.: Facial motion prior networks for facial expression recognition. In: 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2019)

    Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Fan, Y., Lam, J.C., Li, V.O.: Video-based emotion recognition using deeply-supervised neural networks. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 584–588 (2018)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  MATH  Google Scholar 

  8. Huang, Q., Chen, J.: Enhancing academic performance prediction with temporal graph networks for massive open online courses. J. Big Data 11(1), 52 (2024)

    Article  MathSciNet  MATH  Google Scholar 

  9. Huang, Q., Huang, C., Huang, J., Fujita, H.: Adaptive resource prefetching with spatial-temporal and topic information for educational cloud storage systems. Knowl.-Based Syst. 181, 104791 (2019)

    Article  MATH  Google Scholar 

  10. Huang, Q., Huang, C., Wang, X., Jiang, F.: Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 580, 35–54 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  11. Huang, Q., Zeng, Y.: Improving academic performance predictions with dual graph neural networks. Complex Intell. Syst. 1–19 (2024)

    Google Scholar 

  12. Jagadeesh, M., Baranidharan, B.: Facial expression recognition of online learners from real-time videos using a novel deep learning model. Multimedia Syst. 28(6), 2285–2305 (2022)

    Article  MATH  Google Scholar 

  13. Jiang, F., et al.: Face2nodes: learning facial expression representations with relation-aware dynamic graph convolution networks. Inf. Sci. 649, 119640 (2023)

    Article  Google Scholar 

  14. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165. PMLR (2020)

    Google Scholar 

  15. Lasri, I., Solh, A.R., El Belkacemi, M.: Facial emotion recognition of students using convolutional neural network. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–6. IEEE (2019)

    Google Scholar 

  16. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)

    Google Scholar 

  17. Ling, X., Liang, J., Wang, D., Yang, J.: A facial expression recognition system for smart learning based on yolo and vision transformer. In: Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, pp. 178–182 (2021)

    Google Scholar 

  18. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101. IEEE (2010)

    Google Scholar 

  19. Mehta, S., Rastegari, M.: Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021)

  20. Meng, D., Peng, X., Wang, K., Qiao, Y.: Frame attention networks for facial expression recognition in videos. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3866–3870. IEEE (2019)

    Google Scholar 

  21. Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 273–289. Springer (2020)

    Google Scholar 

  22. Ozdemir, M.A., Elagoz, B., Alaybeyoglu, A., Sadighzadeh, R., Akan, A.: Real time emotion recognition from facial expressions using CNN architecture. In: 2019 Medical Technologies Congress (TIPTEKNO), pp. 1–4. IEEE (2019)

    Google Scholar 

  23. Schlag, I., Irie, K., Schmidhuber, J.: Linear transformers are secretly fast weight programmers. In: International Conference on Machine Learning, pp. 9355–9366. PMLR (2021)

    Google Scholar 

  24. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  MATH  Google Scholar 

  25. Tonguç, G., Ozkara, B.O.: Automatic recognition of student emotions from facial expressions during a lecture. Comput. Educ. 148, 103797 (2020)

    Article  MATH  Google Scholar 

  26. Wang, J., Zhang, Z.: Facial expression recognition in online course using light-weight vision transformer via knowledge distillation. In: Pacific Rim International Conference on Artificial Intelligence, pp. 247–253. Springer (2023)

    Google Scholar 

  27. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)

    Google Scholar 

  28. Wang, K., Cheng, M.: Teaching feedback system based on VIT expression recognition in distance education. In: 2024 13th International Conference on Educational and Information Technology (ICEIT), pp. 93–97. IEEE (2024)

    Google Scholar 

  29. Wu, X., et al.: FER-CHC: facial expression recognition with cross-hierarchy contrast. Appl. Soft Comput. 145, 110530 (2023)

    Article  Google Scholar 

  30. Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

    Google Scholar 

  31. Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: erasing attention consistency for noisy label facial expression recognition. In: European Conference on Computer Vision, pp. 418–434. Springer (2022)

    Google Scholar 

  32. Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)

    Article  MATH  Google Scholar 

  33. Zhou, S., Wu, X., Jiang, F., Huang, Q., Huang, C.: Emotion recognition from large-scale video clips with cross-attention and hybrid feature weighting neural networks. Int. J. Environ. Res. Public Health 20(2), 1400 (2023)

    Article  MATH  Google Scholar 

  34. Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X.: Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024)

Download references

Acknowledgement

The research project is supported by the National Natural Science Foundation of China (No. 62207028), partially by Zhejiang Provincial Natural Science Foundation (No. LY23F020009), and the Key R&D Program of Zhejiang Province (No. 2022C03106), and Scientific Research Fund of Zhejiang Provincial Education Department (No. 2023SCG367).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qionghao Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Q., Chen, J. (2025). xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network. In: Zhang, W., Tung, A., Zheng, Z., Yang, Z., Wang, X., Guo, H. (eds) Web and Big Data. APWeb-WAIM 2024 International Workshops. APWeb-WAIM 2024. Communications in Computer and Information Science, vol 2246. Springer, Singapore. https://doi.org/10.1007/978-981-96-0055-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0055-7_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0054-0

  • Online ISBN: 978-981-96-0055-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics