Change and Detection of Emotions Expressed on People’s Faces in Photos
<p>Wheel of emotion [<a href="#B3-applsci-14-10681" class="html-bibr">3</a>].</p> "> Figure 2
<p>Circumplex theory of affect [<a href="#B3-applsci-14-10681" class="html-bibr">3</a>].</p> "> Figure 3
<p>EmoDNN emotion change preview.</p> "> Figure 4
<p>Confusion matrices of trained classifiers (from left based on PyTorch; from right based on TensorFlow).</p> "> Figure 5
<p>Confusion matrices of trained classifiers of generated faces with changed emotion (from left based on PyTorch; from right based on TensorFlow).</p> "> Figure A1
<p>Preview of sample generated images for individual emotions (viewed from the top, the rows represent different emotions; viewed from the left, the consecutive columns represent pairs of images: [original image, image with changed emotion generated by EmoDNN]).</p> "> Figure A1 Cont.
<p>Preview of sample generated images for individual emotions (viewed from the top, the rows represent different emotions; viewed from the left, the consecutive columns represent pairs of images: [original image, image with changed emotion generated by EmoDNN]).</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Human Emotion Profile Classification
2.2. Artificial Intelligence Algorithms Used to Analyze Emotions
3. Methodology
3.1. Concept of the Proposed Method
3.2. Learning Process
3.3. Neural Network Architecture
4. Results
4.1. Applied Dataset
4.2. Emotions Change
4.3. Emotions Detection
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Ekman, P.; Friesen, W.V. Constants across cultures in the face and emotion. J. Personal. Soc. Psychol. 1971, 17, 124–129. [Google Scholar] [CrossRef]
- Plutchik, R. A General Psychoevolutionary Theory of Emotion. In Theories of Emotion; Plutchik, R., Kellerman, H., Eds.; Elsevier: Amsterdam, The Netherlands, 1980; pp. 3–33. [Google Scholar]
- Williams, L.; Arribas-Ayllon, M.; Artemiou, A.; Spasić, I. Comparing the Utility of Different Classification Schemes for Emotive Language Analysis. J. Classif. 2019, 36, 619–648. [Google Scholar] [CrossRef]
- Watson, D.; Tellegen, A. Toward a Consensual Structure of Mood. Psychol. Bull. 1985, 98, 219–235. [Google Scholar] [CrossRef] [PubMed]
- Bistroń, M.; Piotrowski, Z. Comparison of Machine Learning Algorithms Used for Skin Cancer Diagnosis. Appl. Sci. 2022, 12, 9960. [Google Scholar] [CrossRef]
- Walczyna, T.; Piotrowski, Z. Overview of Voice Conversion Methods Based on Deep Learning. Appl. Sci. 2023, 13, 3100. [Google Scholar] [CrossRef]
- Kaczyński, M.; Piotrowski, Z. High-Quality Video Watermarking Based on Deep Neural Networks and Adjustable Subsquares Properties Algorithm. Sensors 2022, 22, 5376. [Google Scholar] [CrossRef]
- Kaczyński, M.; Piotrowski, Z.; Pietrow, D. High-Quality Video Watermarking Based on Deep Neural Networks for Video with HEVC Compression. Sensors 2022, 22, 7552. [Google Scholar] [CrossRef]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4217–4228. [Google Scholar] [CrossRef]
- Ning, X.; Xu, S.; Li, W.; Nie, S. Fegan: Flexible and efficient face editing with pre-trained generator. IEEE Access 2020, 8, 65340–65350. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, Y.; Zhang, Y.; Wang, Y.; Song, Z. A Dual-Direction Attention Mixed Feature Network for Facial Expression Recognition. Electronics 2023, 12, 3595. [Google Scholar] [CrossRef]
- Ning, M.; Salah, A.A.; Ertugrul, I.O. Representation Learning and Identity Adversarial Training for Facial Behavior Understanding. arXiv 2024, arXiv:2407.11243. [Google Scholar]
- Her, M.B.; Jeong, J.; Song, H.; Han, J.-H. Batch Transformer: Look for Attention in Batch. arXiv 2024, arXiv:2407.04218. [Google Scholar]
- Mao, J.; Xu, R.; Yin, X.; Chang, Y.; Nie, B.; Huang, A. POSTER++: A Simpler and Stronger Facial Expression Recognition Network. arXiv 2023, arXiv:2301.12149. [Google Scholar] [CrossRef]
- Chen, Y.; Li, J.; Shan, S.; Wang, M.; Hong, R. From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos. IEEE Trans. Affect. Comput. 2024. early access. [Google Scholar] [CrossRef]
- Savchenko, A.V.; Savchenko, L.V.; Makarov, I. Classifying Emotions and Engagement in Online Learning Based on a Single Facial Expression Recognition Neural Network. IEEE Trans. Affect. Comput. 2022, 13, 2132–2143. [Google Scholar] [CrossRef]
- Wen, Z.; Lin, W.; Wang, T.; Xu, G. Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition. Biomimetics 2023, 8, 199. [Google Scholar] [CrossRef]
- Vo, T.-H.; Lee, G.-S.; Yang, H.-J.; Kim, S.-H. Pyramid with Super Resolution for In-the-Wild Facial Expression Recognition. IEEE Access 2020, 8, 131988–132001. [Google Scholar] [CrossRef]
- Zhao, Z.; Liu, Q.; Zhou, F. Robust Lightweight Facial Expression Recognition Network with Label Distribution Training. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3510–3519. [Google Scholar] [CrossRef]
- Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y. Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. arXiv 2019, arXiv:1905.04075. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2019, 10, 18–31. [Google Scholar] [CrossRef]
- Li, J.; Nie, J.; Guo, D.; Hong, R.; Wang, M. Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers. arXiv 2023, arXiv:2207.11081. [Google Scholar] [CrossRef]
- Zhou, H.; Meng, D.; Zhang, Y.; Peng, X.; Du, J.; Wang, K.; Qiao, Y. Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition. In Proceedings of the ICMI’19: 2019 International Conference on Multimodal Interaction, Suzhou China, 14–18 October 2019; pp. 562–566. [Google Scholar] [CrossRef]
- Walczyna, T.; Piotrowski, Z. Fast Fake: Easy-to-Train Face Swap Model. Appl. Sci. 2024, 14, 2149. [Google Scholar] [CrossRef]
- Perov, I.; Gao, D.; Chervoniy, N.; Liu, K.; Marangonda, S.; Umé, C.; Dpfks, M.; Facenheim, C.S.; RP, L.; Jiang, J.; et al. DeepFaceLab: Integrated, flexible and extensible face-swapping framework. arXiv 2021, arXiv:2005.05535. [Google Scholar]
- Chen, R.; Chen, X.; Ni, B.; Ge, Y. SimSwap: An Efficient Framework for High Fidelity Face Swapping. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2003–2011. [Google Scholar] [CrossRef]
- Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. FaceShifter: Towards High Fidelity and Occlusion Aware Face Swapping. arXiv 2020, arXiv:1912.13457. [Google Scholar]
- Groshev, A.; Maltseva, A.; Chesakov, D.; Kuznetsov, A.; Dimitrov, D. GHOST—A New Face Swap Approach for Image and Video Domains. IEEE Access 2022, 10, 83452–83462. [Google Scholar] [CrossRef]
- Kim, K.; Kim, Y.; Cho, S.; Seo, J.; Nam, J.; Lee, K.; Kim, S.; Lee, K. DiffFace: Diffusion-based Face Swapping with Facial Guidance. arXiv 2022, arXiv:2212.13344. [Google Scholar]
- Wang, Y.; Chen, X.; Zhu, J.; Chu, W.; Tai, Y.; Wang, C.; Li, J.; Wu, Y.; Huang, F.; Ji, R. HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping. arXiv 2021, arXiv:2106.09965. [Google Scholar]
- Tarchi, P.; Lanini, M.C.; Frassineti, L.; Lanatà, A. Real and Deepfake Face Recognition: An EEG Study on Cognitive and Emotive Implications. Brain Sci. 2023, 13, 1233. [Google Scholar] [CrossRef]
- Gupta, G.; Raja, K.; Gupta, M.; Jan, T.; Whiteside, S.T.; Prasad, M. A Comprehensive Review of DeepFake Detection Using Advanced Machine Learning and Fusion Methods. Electronics 2024, 13, 95. [Google Scholar] [CrossRef]
- Alhaji, H.S.; Celik, Y.; Goel, S. An Approach to Deepfake Video Detection Based on ACO-PSO Features and Deep Learning. Electronics 2024, 13, 2398. [Google Scholar] [CrossRef]
- Javed, M.; Zhang, Z.; Dahri, F.H.; Laghari, A.A. Real-Time Deepfake Video Detection Using Eye Movement Analysis with a Hybrid Deep Learning Approach. Electronics 2024, 13, 2947. [Google Scholar] [CrossRef]
- Lim, J.H.; Ye, J.C. Geometric GAN. arXiv 2017, arXiv:1705.02894. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; PMLR: New York, NY, USA, 2017; Volume 70, pp. 214–223. [Google Scholar]
- Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral Normalization for Generative Adversarial Networks. arXiv 2018, arXiv:1802.05957. [Google Scholar]
- Dumoulin, V.; Shlens, J.; Kudlur, M. A Learned Representation For Artistic Style. arXiv 2017, arXiv:1610.07629. [Google Scholar]
- Clevert, D.-A.; Unterthiner, T.; Hochreiter, S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv 2016, arXiv:1511.07289. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 30. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. 6.2.2.3 Softmax Units for Multinoulli Output Distributions. In Deep Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 180–184. ISBN 978-0-26203561-3. [Google Scholar]
- Mahoor, M. AffectNet; Mohammad Mahoor: Denver, CO, USA, 2024; Available online: http://mohammadmahoor.com/affectnet/ (accessed on 6 November 2024).
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.-W.; Kim, S.; Choo, J. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv 2018, arXiv:1711.09020. [Google Scholar]
Component | Description |
---|---|
Generator | The main model for generating images. It uses a U-Net architecture conditioned on emotion vectors. |
Classification MLP | A multi-layer perceptron that processes the one-hot emotion vector into feature vectors. |
- Linear Layer 1 | Transforms the input emotion vector to a higher-dimensional feature space. |
- Activation (GELU) | Applies GELU activation function for non-linearity. |
- Linear Layer 2 | Further transforms the features to match the required input for the U-Net. |
- Activation (GELU) | Applies GELU activation function for non-linearity. |
U-Net | The backbone of the generator, consisting of an encoder and a decoder for image generation. |
Encoder | Encodes the input image into a lower-dimensional latent space. |
- Initial Convolution | A convolutional layer to process the input image. |
- ResDown Blocks | Residual blocks with downsampling, including conditional normalization layers. |
- ResBlock | A residual block that processes features before passing to the decoder. |
Decoder | Decodes the latent representation back into an image. |
- ResBlock | A residual block that processes features before upsampling. |
- ResUp Blocks | Residual blocks with upsampling, including conditional normalization layers. |
- Final Convolution | A convolutional layer to produce the final output image. |
ConditionalNorm2d | Applies conditional normalization based on the emotion vector. |
ResDown | Residual downsampling block with conditional normalization and spectral normalization. |
ResUp | Residual upsampling block with conditional normalization and spectral normalization. |
ResBlock | Standard residual block with conditional normalization and spectral normalization. |
Component | Description |
---|---|
Discriminator | The main model for distinguishing real images from generated ones. It uses an encoder to process images. |
Encoder | Encodes the input image into a lower-dimensional feature representation. |
- Initial Convolution | A convolutional layer that processes the input image. |
- ResDown Blocks | Residual blocks with downsampling. These blocks consist of convolutional layers and normalization. |
- ResBlock | A residual block that processes the features before passing them to the final layers. |
Output Layer | A convolutional layer that reduces the feature map to a single-channel output for real/fake classification. |
Component | Description |
---|---|
Classifier | The main model for classifying the emotion of input images. It uses an encoder to process images. |
Encoder | Encodes the input image into lower-dimensional feature representation. |
- Initial Convolution | A convolutional layer that processes the input image. |
- ResDown Blocks | Residual blocks with downsampling. These blocks consist of convolutional layers and normalization. |
- ResBlock | A residual block that processes the features before passing them to the final layers. |
Output Layer | A convolutional layer that reduces the feature map to a multi-channel output for emotion classification. |
# | Layer Type | # | Layer Type |
---|---|---|---|
1 | Conv2D(filters=32, kernel_size=(3, 3))(Input) | 17 | Dense(units=256)(16) |
2 | MaxPooling2D(pool_size=(2, 2))(1) | 18 | Dense(units=256)(17) |
3 | BatchNormalization(2) | 19 | Dense(units=256)(15) |
4 | Conv2D(filters=32, kernel_size=(3, 3))(3) | 20 | Dense(units=256)(19) |
5 | Conv2D(filters=64, kernel_size=(5, 5))(4) | 21 | Concatenate(axis=1)(18, 20) |
6 | MaxPooling2D(pool_size=(2, 2))(5) | 22 | BatchNormalization(21) |
7 | BatchNormalization(6) | 23 | Dropout(rate=0.31, seed=321)(22) |
8 | Conv2D(filters=64, kernel_size=(5, 5))(7) | 24 | BatchNormalization(23) |
9 | Conv2D(filters=128, kernel_size=(7, 7))(8) | 25 | Dense(units=512)(24) |
10 | Conv2D(filters=128, kernel_size=(7, 7))(9) | 26 | BatchNormalization(25) |
11 | MaxPooling2D(pool_size=(2, 2))(10) | 27 | Dense(units=512)(26) |
12 | BatchNormalization(11) | 28 | BatchNormalization(27) |
13 | Flatten(12) | 29 | Dense(units=1024)(28) |
14 | BatchNormalization(13) | 30 | BatchNormalization(29) |
15 | Dense(units=256)(14) | 31 | Dense(units=8)(30) |
16 | Dropout(rate=0.37, seed=274)(15) |
Annotated Training Set | Validation Set | |||
---|---|---|---|---|
Automatically | Manually | Total | Emotion | |
143,142 | 74,874 | 218,016 | Neutral | 500 |
246,235 | 134,415 | 380,650 | Happy | 500 |
20,854 | 25,459 | 46,313 | Sad | 500 |
17,462 | 14,090 | 31,552 | Surprise | 500 |
3799 | 6378 | 10,177 | Fear | 500 |
890 | 3803 | 4693 | Disgust | 500 |
28,000 | 24,882 | 52,882 | Anger | 500 |
2 | 3750 | 3752 | Contempt | 500 |
460,384 | 287,651 | 748,035 | 4000 |
PyTorch-Based | TensorFlow-Based | |||
---|---|---|---|---|
Recall | F1 Score | Emotion | Recall | F1 Score |
0.57 | 0.48 | Neutral | 0.60 | 0.33 |
0.79 | 0.71 | Happy | 0.69 | 0.68 |
0.59 | 0.59 | Sad | 0.57 | 0.53 |
0.60 | 0.57 | Surprise | 0.53 | 0.51 |
0.54 | 0.59 | Fear | 0.46 | 0.55 |
0.44 | 0.53 | Disgust | 0.43 | 0.51 |
0.56 | 0.52 | Anger | 0.55 | 0.51 |
0.41 | 0.50 | Contempt | 0.33 | 0.42 |
Method | Accuracy (%) |
---|---|
DDAMFN++ [13] | 65.04 |
FMAE [14] | 65.00 |
BTN [15] | 64.29 |
DDAMFN [13] | 64.25 |
POSTER++ [16] | 63.77 |
S2D [17] | 63.06 |
Multi-task EfficientNet-B2 [18] | 63.03 |
DAN [19] | 62.09 |
PSR [20] | 60.68 |
EfficientFace [21] | 59.89 |
RAN [22] | 59.50 |
ViT-tiny [24] | 58.28 |
Weighted-Loss [23] | 58.00 |
ViT-base [24] | 57.99 |
LResNet50E-IR [25] | 53.93 |
PyTorch-based classifier | 56.33 |
TensorFlow-based classifier | 51.93 |
PyTorch-Based | TensorFlow-Based | |||
---|---|---|---|---|
Recall | F1 Score | Emotion | Recall | F1 Score |
0.96 | 0.98 | Neutral | 0.76 | 0.57 |
0.98 | 0.98 | Happy | 0.93 | 0.77 |
0.99 | 0.99 | Sad | 0.65 | 0.57 |
1.00 | 1.00 | Surprise | 0.60 | 0.64 |
1.00 | 1.00 | Fear | 0.45 | 0.56 |
1.00 | 1.00 | Disgust | 0.39 | 0.51 |
0.99 | 0.99 | Anger | 0.64 | 0.65 |
1.00 | 0.98 | Contempt | 0.39 | 0.49 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Piotrowski, Z.; Kaczyński, M.; Walczyna, T. Change and Detection of Emotions Expressed on People’s Faces in Photos. Appl. Sci. 2024, 14, 10681. https://doi.org/10.3390/app142210681
Piotrowski Z, Kaczyński M, Walczyna T. Change and Detection of Emotions Expressed on People’s Faces in Photos. Applied Sciences. 2024; 14(22):10681. https://doi.org/10.3390/app142210681
Chicago/Turabian StylePiotrowski, Zbigniew, Maciej Kaczyński, and Tomasz Walczyna. 2024. "Change and Detection of Emotions Expressed on People’s Faces in Photos" Applied Sciences 14, no. 22: 10681. https://doi.org/10.3390/app142210681
APA StylePiotrowski, Z., Kaczyński, M., & Walczyna, T. (2024). Change and Detection of Emotions Expressed on People’s Faces in Photos. Applied Sciences, 14(22), 10681. https://doi.org/10.3390/app142210681