Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3639856.3639874acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Binary Convolutional Neural Network for Efficient Gesture Recognition at Edge

Published: 17 May 2024 Publication History

Abstract

Vision-based hand gesture recognition in human-computer interface design has useful applications in virtual-reality, gaming control, communication through sign language, medical rehabilitation etc. In many scenarios, such applications are deployed on small handheld or wearable devices, i.e. edge devices. To mitigate the challenges in building a real-time convolutional neural network (CNN) based solution at edge, researchers explore various methods to reduce the computational overhead during inference. One recent development is binarization of CNNs that replaces floating point MAC operations in a network with efficient XNOR-bit count operations, thus drastically reducing inference latency, memory and network computations. In this paper, we propose a Binary Convolutional Neural Network (BCNN) based Deep Learning (DL) inference pipeline for gesture recognition in a car infotainment system. Our DL pipeline requires 3.6x less storage space, 2.3x less network operations, and gives 3x inference speed-up compared to state-of-the-art MediaPipe gesture recognition system by Google. We observe that directly binarizing a CNN results in a large accuracy drop (48%) for our gesture data. To reduce this gap, we propose an optimized BCNN that uses 20x less inference memory, provides 6.7x inference speed-up with a accuracy drop of 12% compared with full-precision CNN in edge device.

References

[1]
G Anandalingam and Terry L Friesz. 1992. Hierarchical optimization: An introduction. Annals of Operations Research 34 (1992), 1–11.
[2]
Peijun Bao, Ana I Maqueda, Carlos R del Blanco, and Narciso García. 2017. Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Transactions on Consumer Electronics 63, 3 (2017), 251–257.
[3]
Richard Bolt, Chris Schmandt, and Eric A. Hulteen. 1980. Put-that-there: Voice and gesture at the graphics interface. https://www.media.mit.edu/publications/put-that-there-voice-and-gesture-at-the-graphics-interface/. [Online].
[4]
Adrian Bulat, Brais Martinez, and Georgios Tzimiropoulos. 2020. Bats: Binary architecture search. In European Conference on Computer Vision. Springer, 309–325.
[5]
Adrian Bulat, Brais Martinez, and Georgios Tzimiropoulos. 2020. High-capacity expert binary networks. arXiv preprint arXiv:2010.03558 (2020).
[6]
Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In Proceedings of the IEEE international conference on computer vision. 3706–3714.
[7]
Douglas Chai and Abdesselam Bouzerdoum. 2000. A Bayesian approach to skin color classification in YCbCr color space. In 2000 TENCON proceedings. Intelligent systems and technologies for the new millennium (Cat. No. 00CH37119), Vol. 2. IEEE, 421–424.
[8]
Yao-Jen Chang, Shu-Fang Chen, and An-Fu Chuang. 2011. A gesture recognition system to transition autonomously through vocational tasks for individuals with cognitive impairments. Research in Developmental Disabilities 32, 6 (2011), 2064–2068. https://doi.org/10.1016/j.ridd.2011.08.010
[9]
Yao-Jen Chang, Shu-Fang Chen, and Jun-Da Huang. 2011. A Kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities. Research in Developmental Disabilities 32, 6 (2011), 2566–2570. https://doi.org/10.1016/j.ridd.2011.07.002
[10]
Hanlin Chen, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji, 2020. Binarized neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10526–10533.
[11]
Hung-Yuan Chung, Yao-Liang Chung, and Wei-Feng Tsai. 2019. An efficient hand gesture recognition system based on deep CNN. In 2019 IEEE International Conference on Industrial Technology (ICIT). IEEE, 853–858.
[12]
Benoît Colson, Patrice Marcotte, and Gilles Savard. 2007. An overview of bilevel optimization. Annals of operations research 153 (2007), 235–256.
[13]
George A. Constantinides. 2019. Rethinking Arithmetic for Deep Neural Networks. CoRR abs/1905.02438 (2019). arXiv:1905.02438http://arxiv.org/abs/1905.02438
[14]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. ArXiv abs/1602.02830 (2016).
[15]
Sajad Darabi, Mouloud Belbahri, Matthieu Courbariaux, and Vahid Partovi Nia. 2018. Regularized binary network training. arXiv preprint arXiv:1812.11800 (2018).
[16]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
[17]
Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu. 2019. Regularizing activation distribution for training binarized deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11408–11417.
[18]
Riyad A. El-laithy, Jidong Huang, and Michael Yeh. 2012. Study on the use of Microsoft Kinect for robotics applications. In Proceedings of the 2012 IEEE/ION Position, Location and Navigation Symposium. 1280–1288. https://doi.org/10.1109/PLANS.2012.6236985
[19]
Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF international conference on computer vision. 4852–4861.
[20]
Jiaxin Gu, Ce Li, Baochang Zhang, Jungong Han, Xianbin Cao, Jianzhuang Liu, and David Doermann. 2019. Projection convolutional neural networks for 1-bit cnns via discrete back propagation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8344–8351.
[21]
Jiaxin Gu, Junhe Zhao, Xiaolong Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, and Rongrong Ji. 2019. Bayesian optimized 1-bit cnns. In Proceedings of the IEEE/CVF international conference on computer vision. 4909–4917.
[22]
Mokhtar M Hasan and Pramod K Mishra. 2012. Hand gesture modeling and recognition using geometric features: a review. Canadian journal on image processing and computer vision 3, 1 (2012), 12–26.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[24]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[25]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
[26]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.
[27]
P. Hu Jia 2007. A CNN-LSTM neural network for recognition of puffing in smoking episodes using wearable sensors. Industrial Robot 34, 1 (2007), 60–68. https://doi.org/10.1108/01439910710718469
[28]
Vijay John, Ali Boyali, Seiichi Mita, Masayuki Imanishi, and Norio Sanma. 2016. Deep learning-based fast hand gesture recognition using representative frames. In 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1–8.
[29]
Alexander Kapitanov, Andrew Makhlyarchuk, and Karina Kvanchiani. 2022. Hagrid-hand gesture recognition image dataset. arXiv preprint arXiv:2206.08219 (2022).
[30]
Harpreet Kaur and Jyoti Rani. 2016. A review: Study of various techniques of Hand gesture recognition. In 2016 IEEE 1st international conference on power electronics, intelligent control and energy systems (ICPEICES). IEEE, 1–5.
[31]
Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).
[32]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[33]
Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. 2019. Real-time hand gesture detection and classification using convolutional neural networks. In 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, 1–8.
[34]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[35]
Luigi Lamberti and Francesco Camastra. 2011. Real-time hand gesture recognition using a color glove. In Image Analysis and Processing–ICIAP 2011: 16th International Conference, Ravenna, Italy, September 14-16, 2011, Proceedings, Part I 16. Springer, 365–373.
[36]
Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. 2019. Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 510–519.
[37]
Hsien-I Lin, Ming-Hsiang Hsu, and Wei-Kai Chen. 2014. Human hand gesture recognition using a convolution neural network. In 2014 IEEE International Conference on Automation Science and Engineering (CASE). IEEE, 1038–1043.
[38]
Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Yan Wang, Yongjian Wu, Feiyue Huang, and Chia-Wen Lin. 2020. Rotated binary neural network. Advances in neural information processing systems 33 (2020), 7474–7485.
[39]
Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. Advances in neural information processing systems 30 (2017).
[40]
Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, and David Doermann. 2019. Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2691–2699.
[41]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
[42]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 21–37.
[43]
Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. 2018. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European conference on computer vision (ECCV). 722–737.
[44]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV). 116–131.
[45]
Bernd Menser and Mathias Wien. 2000. Segmentation and tracking of facial regions in color image sequences. In Visual Communications and Image Processing 2000, Vol. 4067. SPIE, 731–740.
[46]
Asit Mishra, Eriko Nurvitadhi, Jeffrey J Cook, and Debbie Marr. 2017. WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017).
[47]
Jawad Nagi, Frederick Ducatelle, Gianni A Di Caro, Dan Cireşan, Ueli Meier, Alessandro Giusti, Farrukh Nagi, Jürgen Schmidhuber, and Luca Maria Gambardella. 2011. Max-pooling convolutional neural networks for vision-based hand gesture recognition. In 2011 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, 342–347.
[48]
PS Neethu, R Suguna, and Divya Sathish. 2020. An efficient method for human hand gesture detection and recognition using deep learning convolutional neural networks. Soft Computing 24 (2020), 15239–15248.
[49]
Xuan Son Nguyen, Luc Brun, Olivier Lézoray, and Sébastien Bougleux. 2019. A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12036–12045.
[50]
ony Computer Entertainment Europe. 2023. PlayStationEye. https://www.playstation.com/. [Online].
[51]
Electronic Design Pankaj Singh. 2023. Automotive Gesture Recognition—The Next Level in Road Safety. https://www.electronicdesign.com/markets/automotive/article/21807672/global-marketing-insights-automotive-gesture-recognitionthe-next-level-in-road-safety. [Online].
[52]
Phuoc Pham, Jacob A Abraham, and Jaeyong Chung. 2021. Training multi-bit quantized and binarized networks with a learnable symmetric quantizer. IEEE Access 9 (2021), 47194–47203.
[53]
Hai Phan, Yihui He, Marios Savvides, Zhiqiang Shen, 2020. Mobinet: A mobile binary network for image classification. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 3453–3462.
[54]
Hai Phan, Zechun Liu, Dang Huynh, Marios Savvides, Kwang-Ting Cheng, and Zhiqiang Shen. 2020. Binarizing mobilenet via evolution-based searching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13420–13429.
[55]
Physiopedia. 2023. The emerging role of Microsoft Kinect in physiotherapy rehabilitation for stroke patients. https://www.physio-pedia.com/The_emerging_role_of_Microsoft_Kinect_in_physiotherapy_rehabilitation_for_stroke_patients. [Online].
[56]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.
[57]
Frank Rosenblatt. 1958. The perceptron: a probabilistic model for information storage and organization in the brain.Psychological review 65, 6 (1958), 386.
[58]
Charbel Sakr, Jungwook Choi, Zhuo Wang, Kailash Gopalakrishnan, and Naresh Shanbhag. 2018. True gradient-based training of deep binary activated neural networks via continuous binarization. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2346–2350.
[59]
Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, and Yan Yan. 2022. Lipschitz continuity retained binary neural network. In European conference on computer vision. Springer, 603–619.
[60]
Mingzhu Shen, Xianglong Liu, Ruihao Gong, and Kai Han. 2020. Balanced binary neural networks with gated residual. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4197–4201.
[61]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway networks. arXiv preprint arXiv:1505.00387 (2015).
[62]
T. Starner, J. Auxier, D. Ashbrook, and M. Gandy. 2000. The gesture pendant: a self-illuminating, wearable, infrared computer vision system for home automation control and medical monitoring. In Digest of Papers. Fourth International Symposium on Wearable Computers. 87–94. https://doi.org/10.1109/ISWC.2000.888469
[63]
Ekaterini Stergiopoulou, Kyriakos Sgouropoulos, Nikos Nikolaou, Nikos Papamarkos, and Nikos Mitianoudis. 2014. Real time hand detection in a complex background. Engineering Applications of Artificial Intelligence 35 (2014), 54–70.
[64]
Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
[65]
Michael Van den Bergh and Luc Van Gool. 2011. Combining RGB and ToF cameras for real-time 3D hand gesture interaction. In 2011 IEEE workshop on applications of computer vision (WACV). IEEE, 66–72.
[66]
Geospatial Modeling & Visualization. 2023. Microsoft Kinect – Hardware. https://gmv.cast.uark.edu/scanning/hardware/microsoft-kinect-resourceshardware/. [Online].
[67]
Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. 2020. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 390–391.
[68]
Erwei Wang, James J Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter YK Cheung, and George A Constantinides. 2021. Enabling binary neural network training on the edge. In Proceedings of the 5th international workshop on embedded and mobile deep learning. 37–38.
[69]
Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, and George A. Constantinides. 2019. Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going. ACM Comput. Surv. 52, 2, Article 40 (may 2019), 39 pages. https://doi.org/10.1145/3309551
[70]
Robert Y Wang and Jovan Popović. 2009. Real-time hand-tracking with a color glove. ACM transactions on graphics (TOG) 28, 3 (2009), 1–8.
[71]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492–1500.
[72]
Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7308–7316.
[73]
Jinhua Zeng, Yaoru Sun, and Fang Wang. 2012. A natural hand gesture system for intelligent human-computer interaction and medical assistance. In 2012 Third Global Congress on Intelligent Systems. IEEE, 382–385.
[74]
Felix Zhan. 2019. Hand gesture recognition with convolution neural networks. In 2019 IEEE 20th international conference on information reuse and integration for data science (IRI). IEEE, 295–298.
[75]
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
[76]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.
[77]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
[78]
Baozhou Zhu, Zaid Al-Ars, and H Peter Hofstee. 2020. Nasb: Neural architecture search for binary convolutional neural networks. In 2020 International joint conference on neural networks (IJCNN). IEEE, 1–8.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems
October 2023
381 pages
ISBN:9798400716492
DOI:10.1145/3639856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Binary Convolutional Neural Network
  2. Embedded Systems
  3. Gesture Recognition
  4. bi-level optimization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 21
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media