research-article

Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices

Authors:

Hao YuAuthors Info & Claims

ACM Transactions on Internet of Things, Volume 2, Issue 4

Article No.: 23, Pages 1 - 26

https://doi.org/10.1145/3464941

Published: 15 July 2021 Publication History

Abstract

Mobile devices usually suffer from limited computation and storage resources, which seriously hinders them from deep neural network applications. In this article, we introduce a deeply tensor-compressed long short-term memory (LSTM) neural network for fast video-based facial expression recognition on mobile devices. First, a spatio-temporal facial expression recognition LSTM model is built by extracting time-series feature maps from facial clips. The LSTM-based spatio-temporal model is further deeply compressed by means of quantization and tensorization for mobile device implementation. Based on datasets of Extended Cohn-Kanade (CK+), MMI, and Acted Facial Expression in Wild 7.0, experimental results show that the proposed method achieves 97.96%, 97.33%, and 55.60% classification accuracy and significantly compresses the size of network model up to 221× with reduced training time per epoch by 60%. Our work is further implemented on the RK3399Pro mobile device with a Neural Process Engine. The latency of the feature extractor and LSTM predictor can be reduced 30.20× and 6.62×, respectively, on board with the leveraged compression methods. Furthermore, the spatio-temporal model costs only 57.19 MB of DRAM and 5.67W of power when running on the board.

References

[1]

Rockchip. n.d.RK3399Pro. Retrieved June 8, 2021 from http://www.rock-chips.com/a/en/products/RK33_Series/2018/0130/874.html.

[2]

Rockchip. n.d.RK1808. Retrieved June 8, 2021 from http://www.rock-chips.com/a/en/products/RK18_Series/2019/0529/989.html.

[3]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. n.d.VGG Face Dataset. Retrieved June 8, 2021 from http://www.robots.ox.ac.uk/vgg/data/vgg_face/.

[4]

GitHub. n.d.Facial-Expression-Recognition.Pytorch. Retrieved June 8, 2021 from https://github.com/WuJie1010/Facial-Expression-Recognition.Pytorch.

[5]

UMASS. n.d.Labeled Faces in the Wild. Retrieved June 8, 2021 from http://vis-www.cs.umass.edu/lfw/.

[6]

T. Albrici, M. Fasounaki, S. B. Salimi, G. Vray, B. Bozorgtabar, H. K. Ekenel, and J. Thiran. 2019. G2-VER: Geometry guided model ensemble for video-based facial expression recognition. In Proceedings of the IEEE International Conference on Automatic Face Gesture Recognition. 1–6.

[7]

Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems. 7950–7958.

[8]

Junkai Chen, Zenghai Chen, Zheru Chi, and Hong Fu. 2018. Facial expression recognition in video with multiple feature fusion. IEEE Transactions on Affective Computing 9, 1 (2018), 38–50.

Digital Library

[9]

Sheng Chen, Yang Liu, Xiang Gao, and Zhen Han. 2018. MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. In Proceedings of the Chinese Conference on Biometric Recognition. 428–438.

[10]

Dipankar Das. 2014. Human’s facial parts extraction to recognize facial expression. International Journal on Information Theory 3, 3 (2014), 65–72.

[11]

Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2017. From individual to group-level emotion recognition: EmotiW 5.0. In Proceedings of the ACM International Conference on Multimodal Interaction. 524–528.

Digital Library

[12]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625–2634.

[13]

Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. 2015. Recurrent neural networks for emotion recognition in video. In Proceedings of the ACM International Conference on Multimodal Interaction. 467–474.

[14]

Rosenberg Ekman. 1997. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS) (2nd ed.), Paul Eckman and Erika L. Rosenberg (Eds.). Series in Affective Science. Oxford University Press.

[15]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the ACM International Conference on Multimodal Interaction. 445–450.

Digital Library

[16]

Chengyue Gong, Zixuan Jiang, Dilin Wang, Yibo Lin, Qiang Liu, and David Z. Pan. 2019. Mixed precision neural architecture search for energy efficient deep learning. In Proceedings of the International Conference on Computer Aided Design. 1–7.

[17]

Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2016. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27–48.

Digital Library

[18]

Y. Guo, G. Zhao, and M Pietikainen. 2016. Dynamic facial expression recognition with atlas construction and sparse representation.IEEE Transactions on Image Processing 25, 5 (2016), 1977–1992.

Digital Library

[19]

O. Gupta, D. Raviv, and R. Raskar. 2019. Multi-velocity neural networks for facial expression recognition in videos. IEEE Transactions on Affective Computing 10, 2 (2019), 290–296.

[20]

Julia Gusak, Maksym Kholiavchenko, Evgeny Ponomarev, Larisa Markeeva, Philip Blagoveschensky, Andrzej Cichocki, and Ivan Oseledets. 2019. Automated multi-stage compression of neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops.

[21]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149.

[22]

Behzad Hasani and Mohammad H. Mahoor. 2017. Facial expression recognition using enhanced deep 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 30–40.

[23]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (1997), 1735–1780.

Digital Library

[24]

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07-49. University of Massachusetts, Amherst.

[25]

Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo Kim. 2015. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2983–2991.

Digital Library

[26]

Sarasi Kankanamge, Clinton Fookes, and Sridha Sridharan. 2017. Facial analysis in the wild with LSTM networks. In Proceedings of the IEEE International Conference on Image Processing. 1052–1056.

Digital Library

[27]

Dae Hoe Kim, Wissam J. Baddar, Jinhyeok Jang, and Yong Man Ro. 2017. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Transactions on Affective Computing 10, 2 (2017), 223–236.

[28]

Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3D-gradients. In Proceedings of the British Machine Vision Conference, Vol. 1. 275.

[29]

Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arXiv:1703.10722.

[30]

Tim Laibacher, Tillman Weyde, and Sepehr Jalali. 2019. M2U-Net: Effective and efficient retinal vessel segmentation for real-world applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]

Quoc V Le, Navdeep Jaitly, and Geoffrey E. Hinton. 2015. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941.

[32]

Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. DeepRebirth: Accelerating deep neural network execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. 2322–2330.

[33]

Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1749–1756.

Digital Library

[34]

Patrick Lucey, Jeffrey F. Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 94–101.

[35]

Ali Mollahosseini, David Chan, and Mohammad H. Mahoor. 2016. Going deeper in facial expression recognition using deep neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1–10.

[36]

Ivan V. Oseledets. 2011. Tensor-train decomposition. SIAM Journal on Scientific Computing 33, 5 (2011), 2295–2317.

Digital Library

[37]

Xi Ouyang, Shigenori Kawaai, Ester Gue Hua Goh, Shengmei Shen, Wan Ding, Huaiping Ming, and Dong-Yan Huang. 2017. Audio-visual emotion recognition using deep transfer learning and multiple temporal models. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 577–582.

Digital Library

[38]

Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo. 5.

[39]

Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference, Vol. 1. 6.

[40]

Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.

Digital Library

[41]

Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional SIFT descriptor and its application to action recognition. In Proceedings of the ACM International Conference on Multimedia. 357–360.

Digital Library

[42]

Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2005. Robust facial expression recognition using local binary patterns. In Proceedings of the International Conference on Image Processing, Vol. 2. II–370.

[43]

Myunghoon Suk and Balakrishnan Prabhakaran. 2014. Real-time mobile facial expression recognition system—A case study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 132–137.

Digital Library

[44]

Myunghoon Suk and Balakrishnan Prabhakaran. 2015. Real-time facial expression recognition on smartphones. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1054–1059.

Digital Library

[45]

Xu Tang, Daniel K. Du, Zeqiang He, and Jingtuo Liu. 2018. PyramidBox: A context-assisted single shot face detector. In Proceedings of the European Conference on Computer Vision. 797–813.

Digital Library

[46]

Saeed Turabzadeh, Hongying Meng, Rafiq M. Swash, Matus Pleva, and Jozef Juhar. 2017. Real-time emotional state detection from facial expression on embedded devices. In Proceedings of the International Conference on Innovative Computing Technology. 46–51.

[47]

Michel Valstar and Maja Pantic. 2010. Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect. 65.

[48]

Valentin Vielzeuf, Stéphane Pateux, and Frédéric Jurie. 2017. Temporal multimodal fusion for video emotion classification in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction. 569–576.

Digital Library

[49]

Alessandro Vinciarelli, Maja Pantic, and Hervé Bourlard. 2009. Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 12 (2009), 1743–1759.

Digital Library

[50]

Wenqi Wang, Yifan Sun, Brian Eriksson, Wenlin Wang, and Vaneet Aggarwal. 2018. Wide compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9329–9338.

[51]

Yubo Wang, Haizhou Ai, Bo Wu, and Chang Huang. 2004. Real time facial expression recognition with AdaBoost. In Proceedings of the International Conference on Pattern Recognition, Vol. 3. 926–929.

[52]

Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, and Yuan Zong. 2018. Multi-cue fusion for emotion recognition in the wild. Neurocomputing 309 (2018), 27–35.

Digital Library

[53]

Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Transactions on Image Processing 26, 9 (2017), 4193–4203.

Digital Library

[54]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.

[55]

Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 915–928.

Digital Library

Cited By

Hohman FKery MRen DMoritz D(2024)Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning ExperiencesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642109(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642109
Krumnikl MMaiwald V(2024)Facial Emotion Recognition for Mobile Devices: A Practical ReviewIEEE Access10.1109/ACCESS.2024.335845512(15735-15747)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3358455
Cheng YZhang JZhou TWang YXu ZYuan XFang L(2024)Photonic neuromorphic architecture for tens-of-task lifelong learningLight: Science & Applications10.1038/s41377-024-01395-413:1Online publication date: 26-Feb-2024
https://doi.org/10.1038/s41377-024-01395-4
Show More Cited By

Index Terms

Fast Video Facial Expression Recognition by a Deeply Tensor-Compressed LSTM Neural Network for Mobile Devices
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Fast video facial expression recognition by deeply tensor-compressed LSTM neural network on mobile device
SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

Poster: Mobile devices usually suffer from limited computation and storage resource which seriously hinders them from deep neural network applications. In this paper, we introduce a deeply tensor-compressed LSTM neural network for fast facial expression ...
Facial expression recognition with Convolutional Neural Networks

Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine ...
Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet of Things

ACM Transactions on Internet of Things Volume 2, Issue 4

November 2021

190 pages

EISSN:2577-6207

DOI:10.1145/3476109

Editors:
Schahram Dustdar
TU Wien, Austria
,
Gian Pietro Picco
University of Trento, Italy

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 15 July 2021

Accepted: 01 May 2021

Revised: 01 February 2021

Received: 01 October 2019

Published in TIOT Volume 2, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Key Research and Development Program of China
Innovative Team Program of Education Department of Guangdong Province
National Natural Science Foundation of China
Key-Area Research and Development Program of Guangdong Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
319
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)11

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hohman FKery MRen DMoritz D(2024)Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning ExperiencesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642109(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642109
Krumnikl MMaiwald V(2024)Facial Emotion Recognition for Mobile Devices: A Practical ReviewIEEE Access10.1109/ACCESS.2024.335845512(15735-15747)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3358455
Cheng YZhang JZhou TWang YXu ZYuan XFang L(2024)Photonic neuromorphic architecture for tens-of-task lifelong learningLight: Science & Applications10.1038/s41377-024-01395-413:1Online publication date: 26-Feb-2024
https://doi.org/10.1038/s41377-024-01395-4
Tutuianu GLiu YAlamäki AKauttonen J(2024)Benchmarking deep Facial Expression RecognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108983136:PBOnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108983
Zhen PYan XWang WWei HChen H(2023)A Highly Compressed Accelerator With Temporal Optical Flow Feature Fusion and Tensorized LSTM for Video Action Recognition on Terminal DeviceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.324111342:10(3129-3142)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3241113
Nagaraju MYannam ASreedhar P SBhargavi M(2023)Double OptconNet architecture based facial expression recognition in video processingThe Imaging Science Journal10.1080/13682199.2022.216334470:1(46-60)Online publication date: 3-Feb-2023
https://doi.org/10.1080/13682199.2022.2163344
Aikyn NZhanegizov AAidarov TBui DTu N(2023)Efficient facial expression recognition framework based on edge computingThe Journal of Supercomputing10.1007/s11227-023-05548-x80:2(1935-1972)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1007/s11227-023-05548-x
Liu Z(2023)Facial expression generation based on variational AutoEncoder network and cloud computingInternet Technology Letters10.1002/itl2.427Online publication date: 11-Apr-2023
https://doi.org/10.1002/itl2.427
Fakhar SBaber JBazai SMarjan SJasinski MJasinska EChaudhry MLeonowicz ZHussain S(2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
https://doi.org/10.3390/app122312134
Bie MXu HGao YChe X(2022)Facial Expression Recognition from a Single Face Image Based on Deep Learning and Broad LearningWireless Communications & Mobile Computing10.1155/2022/70945392022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7094539
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents