Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions †
<p>Standard Samples of the FER2013 Database.</p> "> Figure 2
<p>Non-standard Samples of the FER2013 Database.</p> "> Figure 3
<p>Confusion matrix of VGG16.</p> "> Figure 4
<p>Confusion matrix of VGG19.</p> "> Figure 5
<p>Confusion matrix of Resnet50.</p> "> Figure 6
<p>Confusion matrix of Resnet101.</p> "> Figure 7
<p>Confusion matrix of DenseNet.</p> "> Figure 8
<p>Confusion matrix of GoogLeNet V1.</p> "> Figure 9
<p>Confusion matrix of MobileNet V1.</p> "> Figure 10
<p>Confusion matrix of EfficientNet V2.</p> "> Figure 11
<p>Confusion matrix of ShuffleNet V2.</p> "> Figure 12
<p>Confusion matrix of RepVGG.</p> "> Figure 13
<p>Accuracy for each epoch on models.</p> ">
Abstract
:1. Introduction
2. Related Works
3. Proposed Methodology
3.1. Dataset Description
3.2. Experiment Setup and Training Parameters
3.2.1. Learning Rate
3.2.2. Batch Size
3.2.3. Training Epochs
3.2.4. Data Augmentation
3.3. IoT Constraints and the Choice of Models
3.4. Deep Learning Algorithms
3.4.1. VGG16 Network Structure
3.4.2. VGG19 Network Structure
3.4.3. Resnet50 Network Structure
3.4.4. Resnet101 Network Structure
3.4.5. DenseNet Network Structure
3.4.6. GoogLeNet V1 Network Structure
3.4.7. MobileNet V1 Network Structure
3.4.8. EfficientNet V2 Network Structure
3.4.9. ShuffleNet V2 Network Structure
3.4.10. RepVGG Network Structure
4. Results
4.1. Model Evaluation Analysis
4.1.1. Test Accuracy Comparison
4.1.2. Training Time Comparison
4.1.3. Weight File Size Impact
4.2. Expression Recognition Performance Analysis
4.2.1. Uniform Trends in Emotion Recognition Across Models
4.2.2. Model-Specific Strengths and Weaknesses in Emotional Classification
4.2.3. Influential Factors in Misclassification and Performance Disparities
4.2.4. Analysis of Validation Accuracy Trends
Early-Stage Learning and Convergence Speed
Mid-Stage Stability and Generalization
Late-Stage Performance and Convergence
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Naga, P.; Marri, S.D.; Borreo, R. Facial emotion recognition methods, datasets and technologies: A literature survey. Mater. Today Proc. 2023, 80, 2824–2828. [Google Scholar] [CrossRef]
- Krumhuber, E.G.; Skora, L.I.; Hill, H.C.H.; Lander, K. The role of facial movements in emotion recognition. Nat. Rev. Psychol. 2023, 2, 283–296. [Google Scholar] [CrossRef]
- Bhatti, Y.K.; Jamil, A.; Nida, N.; Yousaf, M.H.; Viriri, S.; Velastin, S.A. Facial Expression Recognition of Instructor Using Deep Features and Extreme Learning Machine. Comput. Intell. Neurosci. 2021, 2021, 5570870. [Google Scholar] [CrossRef]
- Jurin, R.R.; Roush, D.; Danter, K.J.; Jurin, R.R.; Roush, D.; Danter, J. Communicating without words. In Environmental Communication. Second Edition: Skills and Principles for Natural Resource Managers, Scientists, and Engineers; Springer: Dordrecht, The Netherlands, 2010; pp. 221–230. [Google Scholar] [CrossRef]
- Mortensen, C.D. (Ed.) Communication Theory, 2nd ed.; Routledge: London, UK, 2017. [Google Scholar]
- Ortony, A. Are All ‘Basic Emotions’ Emotions? A Problem for the (Basic) Emotions Construct. Perspect. Psychol. Sci. 2022, 17, 41–61. [Google Scholar] [CrossRef] [PubMed]
- He, Z.; Qin, X. Analysis of Facial Expressions in Class Based on Lightweight Convolutional Neural Network. In Proceedings of the 2022 International Conference on Industrial Automation, Robotics and Control Engineering (IARCE), Chengdu, China, 10–12 June 2022; pp. 68–74. [Google Scholar] [CrossRef]
- Metgud, P.; Naik, N.D.; Sukrutha, M.S.; Prasad, A.S. Real-time Student Emotion and Performance Analysis. In Proceedings of the 2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 8–10 July 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Baranidharan, B.; Bhandari, H.; Tewari, A.; Sachadeva, I.; Abhinav. Engagement in Video Graphic Online Learning Using the Emotional Dimensions in the Learning Context. In Proceedings of the 2023 International Conference for Advancement in Technology (ICONAT), Goa, India, 24–26 January 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Pandey, S.; Handoo, S.; Yogesh. Facial Emotion Recognition using Deep Learning. In Proceedings of the 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, 10–11 March 2022; pp. 248–252. [Google Scholar] [CrossRef]
- Suraj, S.; Sapna, V.M. Deep Learning Approach For Human Emotion-Gender-Age Recognition. In Proceedings of the 2022 3rd International Conference on Communication, Computing and Industry 4.0 (C2I4), Bangalore, India, 15–16 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Atymtayeva, L.; Kanatov, M.; Musleh, A.M.A.; Tulemissova, G. Fast Facial Expression Recognition System: Selection of Models. Appl. Math. Inf. Sci. 2023, 17, 375–383. [Google Scholar] [CrossRef]
- Ozdemir, M.A.; Elagoz, B.; Alaybeyoglu, A.; Sadighzadeh, R.; Akan, A. Real Time Emotion Recognition from Facial Expressions Using CNN Architecture. In Proceedings of the 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 3–5 October 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Wang, H.; Tobón, V.D.P.; Hossain, M.S.; Saddik, A.E. Deep Learning (DL)-Enabled System for Emotional Big Data. IEEE Access 2021, 9, 116073–116082. [Google Scholar] [CrossRef]
- AL-Abboodi, R.H.; AL-Ani, A.A. Facial Expression Recognition Based on GSO Enhanced Deep Learning in IOT Environment. Int. J. Intell. Eng. Syst. 2024, 17, 445–459. [Google Scholar] [CrossRef]
- Aikyn, N.; Zhanegizov, A.; Aidarov, T.; Bui, D.-M.; Tu, N.A. Efficient facial expression recognition framework based on edge computing. J. Supercomput. 2023, 80, 1935–1972. [Google Scholar] [CrossRef]
- Singh, P. Efficient Facial Emotion Detection through Deep Learning Techniques. Cana 2024, 31, 630–638. [Google Scholar] [CrossRef]
- Hu, L.; Ge, Q. Automatic facial expression recognition based on MobileNetV2 in Real-time. J. Phys. Conf. Ser. 2020, 1549, 022136. [Google Scholar] [CrossRef]
- Xia, Q.; Ding, X. Facial Micro-expression Recognition Algorithm Based on Big Data. J. Phys. Conf. Ser. 2021, 2066, 012023. [Google Scholar] [CrossRef]
- Deng, S. Face expression image detection and recognition based on big data technology. Int. J. Intell. Netw. 2023, 4, 218–223. [Google Scholar] [CrossRef]
- Lu, R.; Ji, F. Design and implementation of a virtual teacher teaching system algorithm based on facial expression recognition in the era of big data. Appl. Math. Nonlinear Sci. 2024, 9, 20230053. [Google Scholar] [CrossRef]
- Qian, C.; Marques, J.A.L.; Fong, S.J. Analysis of deep learning algorithms for emotion classification based on facial expression recognition. In Proceedings of the 2024 8th International Conference on Big Data and Internet of Things, Macau, China, 14–16 September 2024; pp. 161–167. [Google Scholar] [CrossRef]
- Sathya, R.; Manivannan, R.; Vaidehi, K. Vision-Based Personal Face Emotional Recognition Approach Using Machine Learning and Tree-Based Classifier. In Lecture Notes in Networks and Systems; Springer: Singapore, 2022; pp. 561–573. [Google Scholar] [CrossRef]
- Siam, A.I.; Soliman, N.F.; Algarni, A.D.; El-Samie, F.E.A.; Sedik, A. Deploying Machine Learning Techniques for Human Emotion Detection. Comput. Intell. Neurosci. 2022, 2022, 8032673. [Google Scholar] [CrossRef] [PubMed]
- Tiwari, T.; Bharti, V.; Srishti; Vishwakarma, S.K. Facial Expression Recognition Using Keras in Machine Learning. In Proceedings of the 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 17–18 December 2021; pp. 466–471. [Google Scholar] [CrossRef]
- Hemmatiyan-Larki, A.M.; Rafiee-Karkevandi, F.; Yazdian-Dehkordi, M. Facial Expression Recognition: A Comparison with Different Classical and Deep Learning Methods. In Proceedings of the 2022 International Conference on Machine Vision and Image Processing (MVIP), Ahvaz, Iran, 23–24 February 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Saeed, S.; Baber, J.; Bakhtyar, M.; Ullah, I.; Sheikh, N.; Dad, I.; Ali, A. Empirical Evaluation of SVM for Facial Expression Recognition. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 670–673. [Google Scholar] [CrossRef]
- Haghpanah, M.A.; Saeedizade, E.; Masouleh, M.T.; Kalhor, A. Real-Time Facial Expression Recognition using Facial Landmarks and Neural Networks. In Proceedings of the 2022 International Conference on Machine Vision and Image Processing (MVIP), Ahvaz, Iran, 23–24 February 2022; pp. 1–7. [Google Scholar] [CrossRef]
- Kumar, P.; Happy, S.L.; Routray, A. A real-time robust facial expression recognition system using HOG features. In Proceedings of the 2016 International Conference on Computing, Analytics and Security Trends (CAST), Pune, India, 19–21 December 2016; pp. 289–293. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
- Dwijayanti, S.; Iqbal, M.; Suprapto, B.Y. Real-Time Implementation of Face Recognition and Emotion Recognition in a Humanoid Robot Using a Convolutional Neural Network. IEEE Access 2022, 10, 89876–89886. [Google Scholar] [CrossRef]
- Bhargavi, Y.; Bini, D.; Prince, S. AI-based Emotion Therapy Bot for Children with Autism Spectrum Disorder (ASD). In Proceedings of the 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 17–18 March 2023; pp. 1895–1899. [Google Scholar] [CrossRef]
- Han, B.; Hu, M.; Wang, X.; Ren, F. A Triple-Structure Network Model Based upon MobileNet V1 and Multi-Loss Function for Facial Expression Recognition. Symmetry 2022, 14, 2055. [Google Scholar] [CrossRef]
- Ahmad, F.; Hariharan, U.; Muthukumaran, N.; Ali, A.; Sharma, S. Emotion recognition of the driver based on KLT algorithm and ShuffleNet V2. Signal Image Video Process. 2024, 18, 3643–3660. [Google Scholar] [CrossRef]
- Xu, X.; Liu, C.; Cao, S.; Lu, L. A high-performance and lightweight framework for real-time facial expression recognition. IET Image Process. 2023, 17, 3500–3509. [Google Scholar] [CrossRef]
- Gupta, S.; Kumar, P.; Tekchandani, R.K. Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models. Multimed. Tools Appl. 2023, 82, 11365–11394. [Google Scholar] [CrossRef] [PubMed]
- Agrawal, G.; Jha, U.; Bidwe, R. Automatic Facial Expression Recognition Using Advanced Transfer Learning. In Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing, in IC3-2023, New York, NY, USA, 3–5 August 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 450–458. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Khan, A.; Rauf, Z.; Sohail, A.; Khan, A.R.; Asif, H.; Asif, A.; Farooq, U. A survey of the vision transformers and their CNN-transformer based variants. Artif. Intell. Rev. 2023, 56, 2917–2970. [Google Scholar] [CrossRef]
- Bi, J.; Zhu, Z.; Meng, Q. Transformer in Computer Vision. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; pp. 178–188. [Google Scholar] [CrossRef]
- Kim, S.; Nam, J.; Ko, B.C. Facial Expression Recognition Based on Squeeze Vision Transformer. Sensors 2022, 22, 3729. [Google Scholar] [CrossRef]
- Liu, C.; Hirota, K.; Dai, Y. Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 2023, 619, 781–794. [Google Scholar] [CrossRef]
- Xue, F.; Wang, Q.; Tan, Z.; Ma, Z.; Guo, G. Vision Transformer With Attentive Pooling for Robust Facial Expression Recognition. IEEE Trans. Affective Comput. 2023, 14, 3244–3256. [Google Scholar] [CrossRef]
- Chaudhari, A.; Bhatt, C.; Krishna, A.; Mazzeo, P.L. ViTFER: Facial Emotion Recognition with Vision Transformers. Appl. Syst. Innov. 2022, 5, 80. [Google Scholar] [CrossRef]
- Huang, Q.; Huang, C.; Wang, X.; Jiang, F. Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 2021, 580, 35–54. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.-H.; et al. Challenges in representation learning: A report on three machine learning contests. Neural. Netw. 2015, 64, 59–63. [Google Scholar] [CrossRef]
- Canedo, D.; Neves, A.J.R. Facial Expression Recognition Using Computer Vision: A Systematic Review. Appl. Sci. 2019, 9, 4678. [Google Scholar] [CrossRef]
- Canal, F.Z.; Müller, T.R.; Matias, J.C.; Scotton, G.G.; Junior, A.R.d.S.; Pozzebon, E.; Sobieranski, A.C. A survey on facial emotion recognition techniques: A state-of-the-art literature review. Inf. Sci. 2022, 582, 593–617. [Google Scholar] [CrossRef]
- Pinto, L.V.L.; Alves, A.V.N.; Medeiros, A.M.; Costa, S.W.d.S.; Pires, Y.P.; Costa, F.A.R.; Seruffo, M.C.d.R. A Systematic Review of Facial Expression Detection Methods. IEEE Access 2023, 11, 61881–61891. [Google Scholar] [CrossRef]
- Li, J.; Jin, K.; Zhou, D.; Kubota, N.; Ju, Z. Attention mechanism-based CNN for facial expression recognition. Neurocomputing 2020, 411, 340–350. [Google Scholar] [CrossRef]
- Qassim, H.; Verma, A.; Feinzimer, D. Compressed residual-VGG16 CNN model for big data places image recognition. In Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018; pp. 169–175. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. In International Conference on Machine Learning; PMLR: London, UK, 2021. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision.–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11218, pp. 122–138. [Google Scholar] [CrossRef]
- Susmaga, R. Confusion Matrix Visualization. In Intelligent Information Processing and Web Mining; MKłopotek, A., Wierzchoń, S.T., Trojanowski, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 107–116. [Google Scholar] [CrossRef]
- Görtler, J.; Hohman, F.; Moritz, D.; Wongsuphasawat, K.; Ren, D.; Nair, R.; Kirchner, M.; Patel, K. Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels. In Proceedings of the CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–13. [Google Scholar] [CrossRef]
- Dobson, J.E. On the Confusion Matrix. Configurations 2024, 32, 331–350. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Markoulidakis, I.; Markoulidakis, G. Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis. Technologies 2024, 12, 113. [Google Scholar] [CrossRef]
- Pommé, L.E.; Bourqui, R.; Giot, R.; Auber, D. Relative Confusion Matrix: An Efficient Visualization for the Comparison of Classification Models. In Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery; Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Eds.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2024; Volume 1126, pp. 223–243. [Google Scholar] [CrossRef]
- Ma, F.; Sun, B.; Li, S. Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion. IEEE Trans. Affect. Comput. 2023, 14, 1236–1248. [Google Scholar] [CrossRef]
- Chen, X.; Zheng, X.; Sun, K.; Liu, W.; Zhang, Y. Self-supervised vision transformer-based few-shot learning for facial expression recognition. Inf. Sci. 2023, 634, 206–226. [Google Scholar] [CrossRef]
Facial Expression | Number of Images | Training Set Instances | Test Set Instances |
---|---|---|---|
Angry | 4983 | 3995 | 958 |
Happy | 8989 | 7215 | 1774 |
Fear | 5121 | 4097 | 1024 |
Disgust | 547 | 436 | 111 |
Sad | 6077 | 4830 | 1247 |
Surprise | 4002 | 3171 | 831 |
Neutral | 6198 | 4965 | 1233 |
Learning Rate | Batch Size | Epochs | Optimizer |
---|---|---|---|
0.001 | 64 | 100 | Adam |
Model | Number of Conv Layers | Number of FC Layers | Activation Function | BN 1 | Key Feature |
---|---|---|---|---|---|
VGG16 | 13 | 3 | ReLU | Yes | Deep architecture, uniform architecture with small filters, good generalization. |
VGG19 | 16 | 3 | ReLU | Yes | Deep architecture with large kernel sizes captures more complex features. |
ResNet50 | 49 | 1 | ReLU | Yes | Residual learning framework enables the training of intense networks with improved stability. |
ResNet101 | 100 | 1 | ReLU | Yes | Residual learning mitigates the challenges associated with vanishing and exploding gradients. |
DenseNet | 118 | 1 | ReLU | Yes | Each layer receives feature maps from all previous layers in a dense connectivity pattern. |
GoogLeNet V1 | 22 | 1 | ReLU | Yes | Inception module is efficient at capturing features at different spatial scales. |
MobileNet V1 | 27 | 1 | ReLU | Yes | Depthwise separable convolution efficiently replaces traditional convolution operations. |
EfficientNet V2 | 53 | 1 | ReLU | Yes | The introduction of the Fused-MBConv architecture and progressive learning strategies. |
ShuffleNet V2 | 24 | 1 | ReLU | Yes | Efficient computation is achieved through the rearrangement of channels. |
RepVGG | 22 (5 stage) | 1 | ReLU | Yes | Multi-branch during training, single branch during inference. |
Model | Test Accuracy (%) | Training Time (S/Epoch) 1 | Weight File Size (MB) |
---|---|---|---|
VGG16 | 67.7 | 79.8 | 512 |
VGG19 | 67.1 | 80.6 | 512 |
ResNet50 | 68.1 | 44.4 | 81.3 |
ResNet101 | 68.1 | 44.7 | 158 |
DenseNet | 67.6 | 54.3 | 27.1 |
GoogLeNet V1 | 64.4 | 27.8 | 39.4 |
MobileNet V1 | 67.1 | 30.8 | 12.3 |
EfficientNet V2 | 68.7 | 46.2 | 77.8 |
ShuffleNet V2 | 62.3 | 20.6 | 1.47 |
RepVGG | 66.1 | 21.6 | 10.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qian, C.; Lobo Marques, J.A.; de Alexandria, A.R.; Fong, S.J. Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions. Sensors 2025, 25, 1478. https://doi.org/10.3390/s25051478
Qian C, Lobo Marques JA, de Alexandria AR, Fong SJ. Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions. Sensors. 2025; 25(5):1478. https://doi.org/10.3390/s25051478
Chicago/Turabian StyleQian, Cheng, João Alexandre Lobo Marques, Auzuir Ripardo de Alexandria, and Simon James Fong. 2025. "Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions" Sensors 25, no. 5: 1478. https://doi.org/10.3390/s25051478
APA StyleQian, C., Lobo Marques, J. A., de Alexandria, A. R., & Fong, S. J. (2025). Application of Multiple Deep Learning Architectures for Emotion Classification Based on Facial Expressions. Sensors, 25(5), 1478. https://doi.org/10.3390/s25051478