Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices

Published: 15 July 2021 Publication History

Abstract

Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62×, respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.

References

[1]
Rockchip. n.d.RK3399Pro. Retrieved June 8, 2021 from http://www.rock-chips.com/a/en/products/RK33_Series/2018/0130/874.html.
[2]
Rockchip. n.d.RK1808. Retrieved June 8, 2021 from http://www.rock-chips.com/a/en/products/RK18_Series/2019/0529/989.html.
[3]
O. M. Parkhi, A. Vedaldi, and A. Zisserman. n.d.VGG Face Dataset. Retrieved June 8, 2021 from http://www.robots.ox.ac.uk/vgg/data/vgg_face/.
[4]
GitHub. n.d.Facial-Expression-Recognition.Pytorch. Retrieved June 8, 2021 from https://github.com/WuJie1010/Facial-Expression-Recognition.Pytorch.
[5]
UMASS. n.d.Labeled Faces in the Wild. Retrieved June 8, 2021 from http://vis-www.cs.umass.edu/lfw/.
[6]
T. Albrici, M. Fasounaki, S. B. Salimi, G. Vray, B. Bozorgtabar, H. K. Ekenel, and J. Thiran. 2019. G2-VER: Geometry guided model ensemble for video-based facial expression recognition. In Proceedings of the IEEE International Conference on Automatic Face Gesture Recognition. 1–6.
[7]
Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems. 7950–7958.
[8]
Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu. 2018. Facial expression recognition in video with multiple feature fusion. IEEE Transactions on Affective Computing 9, 1 (2018), 38–50.
[9]
Sheng Chen, Yang Liu, Xiang Gao, and Zhen Han. 2018. MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. In Proceedings of the Chinese Conference on Biometric Recognition. 428–438.
[10]
Dipankar Das. 2014. Human’s facial parts extraction to recognize facial expression. International Journal on Information Theory 3, 3 (2014), 65–72.
[11]
Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2017. From individual to group-level emotion recognition: EmotiW 5.0. In Proceedings of the ACM International Conference on Multimodal Interaction. 524–528.
[12]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625–2634.
[13]
Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. 2015. Recurrent neural networks for emotion recognition in video. In Proceedings of the ACM International Conference on Multimodal Interaction. 467–474.
[14]
Rosenberg Ekman. 1997. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS) (2nd ed.), Paul Eckman and Erika L. Rosenberg (Eds.). Series in Affective Science. Oxford University Press.
[15]
Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the ACM International Conference on Multimodal Interaction. 445–450.
[16]
Chengyue Gong, Zixuan Jiang, Dilin Wang, Yibo Lin, Qiang Liu, and David Z. Pan. 2019. Mixed precision neural architecture search for energy efficient deep learning. In Proceedings of the International Conference on Computer Aided Design. 1–7.
[17]
Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2016. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27–48.
[18]
Y. Guo, G. Zhao, and M Pietikainen. 2016. Dynamic facial expression recognition with atlas construction and sparse representation.IEEE Transactions on Image Processing 25, 5 (2016), 1977–1992.
[19]
O. Gupta, D. Raviv, and R. Raskar. 2019. Multi-velocity neural networks for facial expression recognition in videos. IEEE Transactions on Affective Computing 10, 2 (2019), 290–296.
[20]
Julia Gusak, Maksym Kholiavchenko, Evgeny Ponomarev, Larisa Markeeva, Philip Blagoveschensky, Andrzej Cichocki, and Ivan Oseledets. 2019. Automated multi-stage compression of neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
[21]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149.
[22]
Behzad Hasani and Mohammad H. Mahoor. 2017. Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 30–40.
[23]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (1997), 1735–1780.
[24]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.
[25]
Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo Kim. 2015. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2983–2991.
[26]
Sarasi Kankanamge, Clinton Fookes, and Sridha Sridharan. 2017. Facial analysis in the wild with LSTM networks. In Proceedings of the IEEE International Conference on Image Processing. 1052–1056.
[27]
Dae Hoe Kim, Wissam J. Baddar, Jinhyeok Jang, and Yong Man Ro. 2017. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Transactions on Affective Computing 10, 2 (2017), 223–236.
[28]
Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3D-gradients. In Proceedings of the British Machine Vision Conference, Vol. 1. 275.
[29]
Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arXiv:1703.10722.
[30]
Tim Laibacher, Tillman Weyde, and Sepehr Jalali. 2019. M2U-Net: Effective and efficient retinal vessel segmentation for real-world applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[31]
Quoc V Le, Navdeep Jaitly, and Geoffrey E. Hinton. 2015. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941.
[32]
Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. DeepRebirth: Accelerating deep neural network execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. 2322–2330.
[33]
Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1749–1756.
[34]
Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 94–101.
[35]
Ali Mollahosseini, David Chan, and Mohammad H. Mahoor. 2016. Going deeper in facial expression recognition using deep neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1–10.
[36]
Ivan V. Oseledets. 2011. Tensor-train decomposition. SIAM Journal on Scientific Computing 33, 5 (2011), 2295–2317.
[37]
Xi Ouyang, Shigenori Kawaai, Ester Gue Hua Goh, Shengmei Shen, Wan Ding, Huaiping Ming, and Dong-Yan Huang. 2017. Audio-visual emotion recognition using deep transfer learning and multiple temporal models. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 577–582.
[38]
Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo. 5.
[39]
Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference, Vol. 1. 6.
[40]
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.
[41]
Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional SIFT descriptor and its application to action recognition. In Proceedings of the ACM International Conference on Multimedia. 357–360.
[42]
Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2005. Robust facial expression recognition using local binary patterns. In Proceedings of the International Conference on Image Processing, Vol. 2. II–370.
[43]
Myunghoon Suk and Balakrishnan Prabhakaran. 2014. Real-time mobile facial expression recognition system—A case study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 132–137.
[44]
Myunghoon Suk and Balakrishnan Prabhakaran. 2015. Real-time facial expression recognition on smartphones. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1054–1059.
[45]
Xu Tang, Daniel K. Du, Zeqiang He, and Jingtuo Liu. 2018. PyramidBox: A context-assisted single shot face detector. In Proceedings of the European Conference on Computer Vision. 797–813.
[46]
Saeed Turabzadeh, Hongying Meng, Rafiq M. Swash, Matus Pleva, and Jozef Juhar. 2017. Real-time emotional state detection from facial expression on embedded devices. In Proceedings of the International Conference on Innovative Computing Technology. 46–51.
[47]
Michel Valstar and Maja Pantic. 2010. Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect. 65.
[48]
Valentin Vielzeuf, Stéphane Pateux, and Frédéric Jurie. 2017. Temporal multimodal fusion for video emotion classification in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction. 569–576.
[49]
Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. 2009. Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 12 (2009), 1743–1759.
[50]
Wenqi Wang, Yifan Sun, Brian Eriksson, Wenlin Wang, and Vaneet Aggarwal. 2018. Wide compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9329–9338.
[51]
Yubo Wang, Haizhou Ai, Bo Wu, and Chang Huang. 2004. Real time facial expression recognition with AdaBoost. In Proceedings of the International Conference on Pattern Recognition, Vol. 3. 926–929.
[52]
Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, and Yuan Zong. 2018. Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309 (2018), 27–35.
[53]
Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Transactions on Image Processing 26, 9 (2017), 4193–4203.
[54]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
[55]
Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 915–928.

Cited By

View all
  • (2024)Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning ExperiencesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642109(1-18)Online publication date: 11-May-2024
  • (2024)Facial Emotion Recognition for Mobile Devices: A Practical ReviewIEEE Access10.1109/ACCESS.2024.335845512(15735-15747)Online publication date: 2024
  • (2024)Photonic neuromorphic architecture for tens-of-task lifelong learningLight: Science & Applications10.1038/s41377-024-01395-413:1Online publication date: 26-Feb-2024
  • Show More Cited By

Index Terms

  1. Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Internet of Things
      ACM Transactions on Internet of Things  Volume 2, Issue 4
      November 2021
      190 pages
      EISSN:2577-6207
      DOI:10.1145/3476109
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 15 July 2021
      Accepted: 01 May 2021
      Revised: 01 February 2021
      Received: 01 October 2019
      Published in TIOT Volume 2, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Mobile device
      2. deep learning
      3. facial expression recognition
      4. tensor decomposition

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)63
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 27 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning ExperiencesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642109(1-18)Online publication date: 11-May-2024
      • (2024)Facial Emotion Recognition for Mobile Devices: A Practical ReviewIEEE Access10.1109/ACCESS.2024.335845512(15735-15747)Online publication date: 2024
      • (2024)Photonic neuromorphic architecture for tens-of-task lifelong learningLight: Science & Applications10.1038/s41377-024-01395-413:1Online publication date: 26-Feb-2024
      • (2024)Benchmarking deep Facial Expression RecognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108983136:PBOnline publication date: 1-Oct-2024
      • (2023)A Highly Compressed Accelerator With Temporal Optical Flow Feature Fusion and Tensorized LSTM for Video Action Recognition on Terminal DeviceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.324111342:10(3129-3142)Online publication date: 1-Oct-2023
      • (2023)Double OptconNet architecture based facial expression recognition in video processingThe Imaging Science Journal10.1080/13682199.2022.216334470:1(46-60)Online publication date: 3-Feb-2023
      • (2023)Efficient facial expression recognition framework based on edge computingThe Journal of Supercomputing10.1007/s11227-023-05548-x80:2(1935-1972)Online publication date: 24-Jul-2023
      • (2023)Facial expression generation based on variational AutoEncoder network and cloud computingInternet Technology Letters10.1002/itl2.427Online publication date: 11-Apr-2023
      • (2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
      • (2022)Facial Expression Recognition from a Single Face Image Based on Deep Learning and Broad LearningWireless Communications & Mobile Computing10.1155/2022/70945392022Online publication date: 1-Jan-2022
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media