Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3617233.3617235acmotherconferencesArticle/Chapter ViewAbstractPublication PagescbmiConference Proceedingsconference-collections
research-article

Spiking-Fer: Spiking Neural Network for Facial Expression Recognition With Event Cameras

Published: 30 December 2023 Publication History

Abstract

Facial Expression Recognition (FER) is an active research domain that has shown great progress recently, notably thanks to the use of large deep learning models. However, such approaches are particularly energy intensive, which makes their deployment difficult for edge devices. To address this issue, Spiking Neural Networks (SNNs) coupled with event cameras are a promising alternative, capable of processing sparse and asynchronous events with lower energy consumption. In this paper, we establish the first use of event cameras for FER, named "Event-based FER", and propose the first related benchmarks by converting popular video FER datasets to event streams. To deal with this new task, we propose "Spiking-FER", a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN). Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x). In addition, an experimental study of various event-based data augmentation techniques is performed to provide insights into the efficient transformations specific to event-based FER.

References

[1]
B. Allaert, I.R. Ward, I.M. Bilasco, C. Djeraba, and M. Bennamoun. 2022. A comparative study on optical flow for facial expression analysis. Neurocomputing 500 (2022), 434–448. https://doi.org/10.1016/j.neucom.2022.05.077
[2]
Mouath Aouayeb, Wassim Hamidouche, Catherine Soladie, Kidiyo Kpalma, and Renaud Seguier. 2021. Learning vision transformer with squeeze and excitation for facial expression recognition. arXiv preprint arXiv:2107.03107 (2021).
[3]
Sami Barchid, José Mennesson, and Chaabane Djéraba. 2022. Bina-Rep Event Frames: A Simple and Effective Representation for Event-Based Cameras. In 2022 IEEE International Conference on Image Processing (ICIP). 3998–4002. https://doi.org/10.1109/ICIP46576.2022.9898061
[4]
Sami Barchid, José Mennesson, Jason Eshraghian, Chaabane Djéraba, and Mohammed Bennamoun. 2022. Spiking neural networks for frame-based and event-based single object localization. arXiv preprint arXiv:2206.06506 (2022).
[5]
Daniel Canedo and António JR Neves. 2019. Facial expression recognition using computer vision: a systematic review. Applied Sciences 9, 21 (2019), 4678.
[6]
Loïc Cordone, Benoît Miramond, and Philippe Thierion. 2022. Object Detection with Spiking Neural Networks on Automotive Event Data. arXiv preprint arXiv:2205.04339 (2022).
[7]
Mike Davies, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R Risbud. 2021. Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE 109, 5 (2021), 911–934.
[8]
Jason K Eshraghian, Max Ward, Emre Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, and Wei D Lu. 2021. Training spiking neural networks using lessons from deep learning. arXiv preprint arXiv:2109.12894 (2021).
[9]
Wei Fang, Yanqi Chen, Jianhao Ding, Ding Chen, Zhaofei Yu, Huihui Zhou, Yonghong Tian, and other contributors. 2020. SpikingJelly. Accessed: 2022-02-01.
[10]
Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, and Yonghong Tian. 2021. Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems 34 (2021).
[11]
Yang Feng, Hengyi Lv, Hailong Liu, Yisa Zhang, Yuyao Xiao, and Chengshan Han. 2020. Event density based denoising method for dynamic vision sensor. Applied Sciences 10, 6 (2020), 2024.
[12]
Guillermo Gallego, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, Jörg Conradt, Kostas Daniilidis, 2020. Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence 44, 1 (2020), 154–180.
[13]
Eva García-Martín, Crefeda Faviola Rodrigues, Graham Riley, and Håkan Grahn. 2019. Estimation of energy consumption in machine learning. J. Parallel and Distrib. Comput. 134 (2019), 75–88.
[14]
Fuqiang Gu, Weicong Sng, Xuke Hu, and Fangwen Yu. 2021. EventDrop: Data Augmentation for Event-based Learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 700–707. https://doi.org/10.24963/ijcai.2021/97 Main Track.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[16]
Mark Horowitz. 2014. Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). In IEEE, feb.
[17]
Yuhuang Hu, Shih-Chii Liu, and Tobi Delbruck. 2021. v2e: From video frames to realistic DVS events. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1312–1321.
[18]
Youngeun Kim, Joshua Chough, and Priyadarshini Panda. 2021. Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation. arXiv preprint arXiv:2110.07742 (2021).
[19]
Jürgen Kogler, Christoph Sulzbachner, and Wilfried Kubinger. 2009. Bio-inspired stereo vision system with silicon retina imagers. In International Conference on Computer Vision Systems. Springer, 174–183.
[20]
Ankith Jain Rakesh Kumar and Bir Bhanu. 2022. Three Stream Graph Attention Network Using Dynamic Patch Selection for the Classification of Micro-Expressions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2476–2485.
[21]
Gregor Lenz, Kenneth Chaney, Sumit Bam Shrestha, Omar Oubari, Serge Picaud, and Guido Zarrella. 2021. Tonic: event-based datasets and transformations.https://doi.org/10.5281/zenodo.5079802 Documentation available under https://tonic.readthedocs.io.
[22]
Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE transactions on affective computing 13, 3 (2020), 1195–1215.
[23]
Shan Li and Weihong Deng. 2020. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing (2020), 1–1. https://doi.org/10.1109/TAFFC.2020.2981446
[24]
Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE transactions on affective computing 13, 3 (2020), 1195–1215.
[25]
Yuhang Li, Youngeun Kim, Hyoungseob Park, Tamar Geller, and Priyadarshini Panda. 2022. Neuromorphic data augmentation for training spiking neural networks. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII. Springer, 631–649.
[26]
Yuanyuan Liu, Wenbin Wang, Chuanxu Feng, Haoyu Zhang, Zhe Chen, and Yibing Zhan. 2021. Expression Snippet Transformer for Robust Video-based Facial Expression Recognition. arXiv preprint arXiv:2109.08409 (2021).
[27]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
[28]
Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, 94–101.
[29]
Warren S McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5, 4 (1943), 115–133.
[30]
Emre O Neftci, Hesham Mostafa, and Friedemann Zenke. 2019. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine 36, 6 (2019), 51–63.
[31]
Xianzhang Pan, Wenping Guo, Xiaoying Guo, Wenshu Li, Junjie Xu, and Jinzhao Wu. 2019. Deep temporal–spatial aggregation for video-based facial expression recognition. Symmetry 11, 1 (2019), 52.
[32]
Maja Pantic, Michel Valstar, Ron Rademaker, and Ludo Maat. 2005. Web-based database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo. IEEE, 5–pp.
[33]
Delphine Poux, Benjamin Allaert, Nacim Ihaddadene, Ioan Marius Bilasco, Chaabane Djeraba, and Mohammed Bennamoun. 2021. Dynamic Facial Expression Recognition under Partial Occlusion with Optical Flow Reconstruction. IEEE Transactions on Image Processing 31 (2021), 446–457.
[34]
Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, and Shih-Chii Liu. 2017. Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification. Frontiers in Neuroscience 11 (2017). https://doi.org/10.3389/fnins.2017.00682
[35]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.
[36]
Madhumita A Takalkar, Min Xu, and Zenon Chaczko. 2020. Manifold feature integration for micro-expression recognition. Multimedia Systems 26, 5 (2020), 535–551.
[37]
Job Van Der Schalk, Skyler T Hawk, Agneta H Fischer, and Bertjan Doosje. 2011. Moving faces, looking places: validation of the Amsterdam Dynamic Facial Expression Set (ADFES).Emotion 11, 4 (2011), 907.
[38]
Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 10 (1990), 1550–1560.
[39]
Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, and Guoqi Li. 2021. Temporal-wise attention spiking neural networks for event streams classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10221–10230.
[40]
Yongqiang Yao, Di Huang, Xudong Yang, Yunhong Wang, and Liming Chen. 2018. Texture and geometry scattering representation-based facial expression recognition in 2D+ 3D videos. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1s (2018), 1–23.
[41]
Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, and Xin Yang. 2022. Spiking Transformers for Event-Based Single Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8801–8810.
[42]
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang Li. 2018. Spatial–temporal recurrent neural network for emotion recognition. IEEE transactions on cybernetics 49, 3 (2018), 839–847.
[43]
Guoying Zhao, Xiaohua Huang, Matti Taini, Stan Z. Li, and Matti Pietikäinen. 2011. Facial expression recognition from near-infrared videos. Image and Vision Computing 29, 9 (2011), 607–619. https://doi.org/10.1016/j.imavis.2011.07.002
[44]
Zhaokun Zhou, Yuesheng Zhu, Chao He, Yaowei Wang, Shuicheng Yan, Yonghong Tian, and Li Yuan. 2022. Spikformer: When Spiking Neural Network Meets Transformer. arXiv preprint arXiv:2209.15425 (2022).

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CBMI '23: Proceedings of the 20th International Conference on Content-based Multimedia Indexing
September 2023
274 pages
ISBN:9798400709128
DOI:10.1145/3617233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Event Camera
  2. Event Data Augmentation
  3. Facial Expression Recognition
  4. Spiking Neural Network
  5. Surrogate Gradient Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CBMI 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 61
    Total Downloads
  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)11
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media