Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Deep Neural Network Augmentation: Generating Faces for Affect Analysis

Published: 01 May 2020 Publication History

Abstract

This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs:(i) a neutral 2D image of a person; (ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, 600,000 frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from fifteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with generative adversarial networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training deep neural networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases.

References

[1]
Abbasnejad, I., Sridharan, S., Nguyen, D., Denman, S., Fookes, C., & Lucey, S. (2017). Using synthetic data to improve facial expression analysis with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 1609–1618).
[2]
Alabort-i-Medina, J., Antonakos, E., Booth, J., Snape, P., & Zafeiriou, S. (2014). Menpo: A comprehensive platform for parametric image alignment and visual deformable models. In Proceedings of the ACM international conference on multimedia, MM’14 (pp. 679–682). New York, NY, USA: ACM. 10.1145/2647868.2654890.http://doi.acm.org/10.1145/2647868.2654890.
[3]
Alabort-i Medina J and Zafeiriou S A unified framework for compositional fitting of active appearance models International Journal of Computer Vision 2017 121 1 26-64
[4]
Amberg, B., Romdhani, S., & Vetter, T.: Optimal step nonrigid icp algorithms for surface registration. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE (2007).
[5]
Antoniou, A., Storkey, A., & Edwards, H. (2017). Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340.
[6]
Averbuch-Elor H, Cohen-Or D, Kopf J, and Cohen MF Bringing portraits to life ACM Transactions on Graphics (TOG) 2017 36 6 196
[7]
Bach F, Jenatton R, Mairal J, and Obozinski GOptimization with sparsity-inducing penaltiesFoundations and Trends in Machine Learning2012411-10606064248
[8]
Blanz, V., Basso, C., Poggio, T., & Vetter, T. (2003). Reanimating faces in images and video. In Computer graphics forum (Vol. 22, pp. 641–650). Wiley Online Library.
[9]
Booth, J., Antonakos, E., Ploumpis, S., Trigeorgis, G., Panagakis, Y., & Zafeiriou, S. (2017). 3d face morphable models “in-the-wild”. In IEEE Conference on computer vision and pattern recognition (CVPR). https://arxiv.org/abs/1701.05360.
[10]
Booth J, Roussos A, Ponniah A, Dunaway D, and Zafeiriou SLarge scale 3d morphable modelsInternational Journal of Computer Vision20181262–4233-2543766618
[11]
Booth, J., & Zafeiriou, S. (2014). Optimal uv spaces for facial morphable model construction. In 2014 IEEE international conference on image processing (ICIP) (pp. 4672–4676). IEEE.
[12]
Cao C, Hou Q, and Zhou K Displaced dynamic expression regression for real-time facial tracking and animation ACM Transactions on Graphics (TOG) 2014 33 4 43
[13]
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., & Zisserman, A. (2018). Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 67–74). IEEE.
[14]
Chang, W. Y., Hsu, S. H., & Chien, J. H. (2017). Fatauva-net : An integrated deep learning framework for facial attribute recognition, action unit (au) detection, and valence-arousal estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshop.
[15]
Cheng, S., Kotsia, I., Pantic, M., Zafeiriou, S., & (2018). 4dfab: A large scale 4d database for facial expression analysis and biometric applications. In IEEE conference on computer vision and pattern recognition (CVPR 2018). Utah, US: Salt Lake City.
[16]
Chew SW, Lucey P, Lucey S, Saragih J, Cohn JF, Matthews I, et al. In the pursuit of effective affective computing: The relationship between features and registration IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2012 42 4 1006-1016
[17]
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789–8797).
[18]
Cosker, D., Krumhuber, E., & Hilton, A. (2011). A facs valid 3d dynamic action unit database with applications to 3d dynamic morphable facial modeling. In 2011 international conference on computer vision (pp. 2296–2303). IEEE.
[19]
Deng, J., Zhou, Y., Cheng, S., & Zaferiou, S.: Cascade multi-view hourglass model for robust 3d face alignment, pp. 399–403 (2018). 10.1109/FG.2018.00064.
[20]
Dhall, A., Goecke, R., Ghosh, S., Joshi, J., Hoey, J., & Gedeon, T. (2017). From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM international conference on multimodal interaction (pp. 524–528). ACM.
[21]
Ding, H., Sricharan, K., & Chellappa, R. (2018). Exprgan: Facial expression editing with controllable expression intensity. In Thirty-second AAAI conference on artificial intelligence.
[22]
Eidinger E, Enbar R, and Hassner T Age and gender estimation of unfiltered faces IEEE Transactions on Information Forensics and Security 2014 9 12 2170-2179
[23]
Fried O, Shechtman E, Goldman DB, and Finkelstein A Perspective-aware manipulation of portrait photos ACM Transactions on Graphics (TOG) 2016 35 4 128
[24]
Garrido, P., Valgaerts, L., Rehmsen, O., Thormahlen, T., Perez, P., & Theobalt, C. (2014). Automatic face reenactment. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4217–4224).
[25]
Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., & Freeman, W. T. (2018). Unsupervised training for 3d morphable model regression. In The IEEE conference on computer vision and pattern recognition (CVPR).
[26]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
[27]
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., et al. (2013). Challenges in representation learning: A report on three machine learning contests. In International conference on neural information processing (pp. 117–124). Springer.
[28]
Gross R, Matthews I, Cohn J, Kanade T, and Baker S Multi-pie Image and Vision Computing 2010 28 5 807-813
[29]
Knyazev, B., Shvetsov, R., Efremova, N., & Kuharenko, A. (2017). Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598.
[30]
Kollias, D., Nicolaou, M. A., Kotsia, I., Zhao, G., & Zafeiriou, S. (2017). Recognition of affect in the wild using deep neural networks. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1972–1979). IEEE.
[31]
Kollias D, Tzirakis P, Nicolaou MA, Papaioannou A, Zhao G, Schuller B, et al. Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond International Journal of Computer Vision 2019 127 6–7 907-929
[32]
Kossaifi J, Tzimiropoulos G, Todorovic S, and Pantic M AFEW-VA database for valence and arousal estimation in-the-wild Image and Vision Computing 2017 65 23-36
[33]
Kuipers JB et al. Quaternions and rotation sequences 1999 Princeton Princeton University Press
[34]
Langner O, Dotsch R, Bijlstra G, Wigboldus DH, Hawk ST, and Van Knippenberg A Presentation and validation of the radboud faces database Cognition and Emotion 2010 24 8 1377-1388
[35]
Lawrence I and Lin K A concordance correlation coefficient to evaluate reproducibility Biometrics 1989 45 1 255-268
[36]
Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2584–2593). IEEE.
[37]
Liu X, Mao T, Xia S, Yu Y, and Wang Z Facial animation by optimized blendshapes from motion capture data Computer Animation and Virtual Worlds 2008 19 3–4 235-245
[38]
Ma, L., & Deng, Z. (2019). Real-time facial expression transformation for monocular rgb video. In Computer Graphics Forum (Vol. 38, pp. 470–481). Wiley Online Library.
[39]
Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook. Springer.
[40]
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
[41]
Mohammed U, Prince SJ, and Kautz J Visio-lization: Generating novel facial images ACM Transactions on Graphics (TOG) 2009 28 3 57
[42]
Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017). Affectnet: A database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985.
[43]
Neumann T, Varanasi K, Wenger S, Wacker M, Magnor M, and Theobalt C Sparse localized deformation components ACM Transactions on Graphics (TOG) 2013 32 6 179
[44]
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC (Vol. 1, p. 6).
[45]
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In 2009 sixth IEEE international conference on advanced video and signal based surveillance (pp. 296–301). IEEE.
[46]
Pérez, P., Gangnet, M., & Blake, A. (2003). Poisson image editing. In ACM SIGGRAPH 2003 Papers, SIGGRAPH’03 (pp. 313–318). ACM, New York, NY, USA. 10.1145/1201775.882269.http://doi.acm.org/10.1145/1201775.882269.
[47]
Pham, H. X., Wang, Y., & Pavlovic, V. (2018). Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural network. arXiv preprint arXiv:1803.07716.
[48]
Pumarola, A., Agudo, A., Martinez, A. M., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In Proceedings of the European conference on computer vision (ECCV) (pp. 818–833).
[49]
Qiao, F., Yao, N., Jiao, Z., Li, Z., Chen, H., & Wang, H. (2018). Geometry-contrastive gan for facial expression transfer. arXiv preprint arXiv:1802.01822.
[50]
Reed, S., Sohn, K., Zhang, Y., & Lee, H. (2014). Learning to disentangle factors of variation with manifold interaction. In International conference on machine learning (pp. 1431–1439).
[51]
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the recola multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG) (pp. 1–8). IEEE.
[52]
Rothe, R., Timofte, R., & Van Gool, L. (2015). Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE international conference on computer vision workshops (pp. 10–15).
[53]
Russell JA Evidence of convergent validity on the dimensions of affect Journal of Personality and Social Psychology 1978 36 10 1152
[54]
Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., et al. (2008). Bosphorus database for 3d face analysis. In European workshop on biometrics and identity management (pp. 47–56). Springer.
[55]
Shang, F., Liu, Y., Cheng, J., & Cheng, H. (2014). Robust principal component analysis with missing data. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM’14 (pp. 1149–1158). New York, NY, USA: ACM. 10.1145/2661829.2662083. http://doi.acm.org/10.1145/2661829.2662083.
[56]
Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (pp. 3483–3491).
[57]
Song, L., Lu, Z., He, R., Sun, Z., & Tan, T. (2018). Geometry guided adversarial facial expression synthesis. In 2018 ACM multimedia conference on multimedia conference (pp. 627–635). ACM.
[58]
Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification–verification. In Advances in neural information processing systems (pp. 1988–1996).
[59]
Susskind, J. M., Hinton, G. E., Movellan, J. R., & Anderson, A. K. (2008). Generating facial expressions with deep belief nets. In Affective computing. IntechOpen.
[60]
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701–1708).
[61]
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387–2395).
[62]
Thies J, Zollhöfer M, Theobalt C, Stamminger M, and Nießner M Headon: real-time reenactment of human portrait videos ACM Transactions on Graphics (TOG) 2018 37 4 164
[63]
Thomaz CE and Giraldi GA A new ranking method for principal components analysis and its application to face image analysis Image and Vision Computing 2010 28 6 902-913
[64]
Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., et al. (2016). Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge (pp. 3–10). ACM.
[65]
Vielzeuf, V., Pateux, S., & Jurie, F. (2017). Temporal multimodal fusion for video emotion classification in the wild. arXiv preprint arXiv:1709.07200.
[66]
Wheeler, M. D., & Ikeuchi, K. (1995). Iterative estimation of rotation and translation using the quaternion. Department of Computer Science, Carnegie-Mellon University.
[67]
Whissell, C. M. (1989). The dictionary of affect in language. In The measurement of emotions (pp. 113–131). Elsevier.
[68]
Wright SJ, Nowak RD, and Figueiredo MATSparse reconstruction by separable approximationIEEE Transactions on Signal Processing20095772479-249326501651391.94442
[69]
Wu, W., Zhang, Y., Li, C., Qian, C., & Change Loy, C. (2018). Reenactgan: Learning to reenact faces via boundary transfer. In Proceedings of the European conference on computer vision (ECCV) (pp. 603–619).
[70]
Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In 7th international conference on automatic face and gesture recognition, 2006. FGR 2006 (pp. 211–216). IEEE.
[71]
Zafeiriou, S., Kollias, D., Nicolaou, M. A., Papaioannou, A., Zhao, G., & Kotsia, I. (2017). Aff-wild: Valence and arousal ‘in-the-wild’challenge. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1980–1987). IEEE.
[72]
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146.
[73]
Zhang K, Zhang Z, Li Z, and Qiao Y Joint face detection and alignment using multitask cascaded convolutional networks IEEE Signal Processing Letters 2016 23 10 1499-1503
[74]
Zhou, Y., & Shi, B. E. (2017). Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In 2017 seventh international conference on affective computing and intelligent interaction (ACII) (pp. 370–376). IEEE.
[75]
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).
[76]
Zhu, X., Liu, Y., Li, J., Wan, T., & Qin, Z. (2018). Emotion classification with data augmentation using generative adversarial networks. In Pacific-Asia conference on knowledge discovery and data mining (pp. 349–360). Springer.

Cited By

View all
  • (2024)Distribution matching for multi-task learning of classification tasksProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i3.28061(2813-2821)Online publication date: 20-Feb-2024
  • (2023)AVCAffeProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25078(76-85)Online publication date: 7-Feb-2023
  • (2023)Overcoming Occlusion for Robust Facial Expression Recognition using Adaptive Dual-Attention NetProceedings of the 6th International Conference on Information Technologies and Electrical Engineering10.1145/3640115.3640175(368-373)Online publication date: 3-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision
International Journal of Computer Vision  Volume 128, Issue 5
May 2020
503 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2020
Accepted: 05 February 2020
Received: 31 October 2018

Author Tags

  1. Dimensional
  2. Categorical affect
  3. Valence
  4. Arousal
  5. Basic emotions
  6. Facial affect synthesis
  7. 4DFAB
  8. Blendshape models
  9. 3DMM fitting
  10. DNNs
  11. StarGAN
  12. GANimation
  13. Data augmentation
  14. Affect recognition
  15. Facial expression transfer

Qualifiers

  • Research-article

Funding Sources

  • Imperial College London

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Distribution matching for multi-task learning of classification tasksProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i3.28061(2813-2821)Online publication date: 20-Feb-2024
  • (2023)AVCAffeProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25078(76-85)Online publication date: 7-Feb-2023
  • (2023)Overcoming Occlusion for Robust Facial Expression Recognition using Adaptive Dual-Attention NetProceedings of the 6th International Conference on Information Technologies and Electrical Engineering10.1145/3640115.3640175(368-373)Online publication date: 3-Nov-2023
  • (2023)Adaptive Fusion Attention Network for Facial Expression RecognitionProceedings of the 2023 12th International Conference on Computing and Pattern Recognition10.1145/3633637.3633678(260-264)Online publication date: 27-Oct-2023
  • (2023)Lightweight Facial Expression Recognition Network with Dynamic Deep Mutual LearningProceedings of the 2023 3rd International Conference on Bioinformatics and Intelligent Computing10.1145/3592686.3592726(222-226)Online publication date: 10-Feb-2023
  • (2023)Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613784(6161-6170)Online publication date: 26-Oct-2023
  • (2023)Examining Subject-Dependent and Subject-Independent Human Affect Inference from Limited Video Data2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG57933.2023.10042798(1-6)Online publication date: 5-Jan-2023
  • (2023)Relation-aware Network for Facial Expression Recognition2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG57933.2023.10042525(1-7)Online publication date: 5-Jan-2023
  • (2023)Affective Prior Topology Graph Guided Facial Expression RecognitionBiometric Recognition10.1007/978-981-99-8565-4_17(170-179)Online publication date: 1-Dec-2023
  • (2022)Blindfold Attention: Novel Mask Strategy for Facial Expression RecognitionProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531416(624-630)Online publication date: 27-Jun-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media