Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA
<p>ATIS operation principles. When a pixel’s luminosity change reaches a given threshold (<b>a</b>), it produces a visual event with an (x,y) address and a polarity, which is either ON or OFF (<b>b</b>).</p> "> Figure 2
<p><b>Left</b>: captured histogram of events from neuromorphic vision sensors (ON = black events; OFF = white events). <b>Right</b>: Events temporal activity diagram from the sensor.</p> "> Figure 3
<p>Example of a HOTS layer processing workflow. An input stimulus is processed by the sensor, sending a stream of events with (x,y) addresses and an ON/OFF polarity (<b>a</b>). The timestamp context of the incoming event is processed by applying a linear decay, creating the time-surface (<b>b</b>). Using the Euclidean distance, the time-surface is compared with the bank of prototypes; the closest one will send out an event with the same (x,y) but with the corresponding ID of the prototype. Finally, the events are sent out to another layer or integrated over time in order to generate a histogram, which is then processed by a classifier (<b>c</b>).</p> "> Figure 4
<p>Time-surface generator module workflow.</p> "> Figure 5
<p>Euclidean distance estimator module workflow.</p> "> Figure 6
<p>Histograms comparator module (HGCM). The blue signal is asserted when the global counter reaches the integration time, given by the <math display="inline"><semantics> <mi>τ</mi> </semantics></math> value. At that moment, the content of the Pattern counter, which is the histogram of the activated features, is compared with the trained histograms (TH).</p> "> Figure 7
<p>F-HOTS global architecture. The ARM processor configures the parameters and the prototypes of each EDE module. The time-surface generator module creates a time-surface from the incoming event received from the input AER bus (AER IN). Each partial time-surface result is processed by the eight EDE module; then, after processing the histograms, the classification result is sent through the output AER bus (AER OUT).</p> "> Figure 8
<p>Hand gestures from (<b>a</b>–<b>g</b>): Left, Right, Hello Hand, Up, Down, Select.</p> "> Figure 9
<p>Experimental Setup. <b>Left</b>: USBAERmini2 that sends and receives events from FPGA. <b>Right</b>: Zynq MMP board with F-HOTS architecture implemented.</p> "> Figure 10
<p>Power consumption of FPGA components for each bit resolution.</p> "> Figure 11
<p><b>Left axis</b>: Processing time per event with different radii. <b>Right axis</b>: Evolution of mega-events per second for each different radius.</p> "> Figure 12
<p><b>Left axis</b>: Mops/s performed with different radii, at a frequency of 100 MHz. <b>Right axis</b>: memory accesses performed.</p> ">
Abstract
:1. Introduction
- HDL description and implementation of HOTS for FPGA, taking advantage of their memory organization and square-root algorithms.
- Real-time demonstration for embedded systems and proof of their low latency and reduced power consumption.
2. Materials and Methods
2.1. Event-Based Vision Sensors
2.2. Time-Surfaces
2.3. System Architecture
2.3.1. Time-Surface Generator
2.3.2. Euclidean Distance Estimator
2.3.3. Histograms Generator and Comparator Module
2.3.4. Hardware Implementation
3. Experimental Set-Up and Results
3.1. Loss Test
3.2. Performance Test
4. Discussion and Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 2012 Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556v6. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Tallent, N.R.; Gawande, N.A.; Siegel, C.; Vishnu, A.; Hoisie, A. Evaluating On-Node GPU Interconnects for Deep Learning Workloads; Springer: Berlin, Germany, 2018; pp. 3–21. [Google Scholar]
- Saeed, A.; Al-Hamadi, A.; Niese, R.; Elzobi, M. Frame-Based Facial Expression Recognition Using Geometrical Features. Adv. Hum. Comput. Interact. 2014, 2014. [Google Scholar] [CrossRef]
- Zanchettin, C.; Bezerra, B.L.D.; Azevedo, W.W. A KNN-SVM hybrid model for cursive handwriting recognition. In Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, 10–15 June 2012; pp. 1–8. [Google Scholar]
- Farabet, C.; Paz, R.; Pérez-Carrasco, J.; Zamarreño, C.; Linares-Barranco, A.; LeCun, Y.; Culurciello, E.; Serrano-Gotarredona, T.; Linares-Barranco, B. Comparison between frame-constrained fix-pixel-value and frame-free spiking-dynamic-pixel ConvNets for visual processing. Front. Neurosci. 2012, 6, 32. [Google Scholar] [CrossRef] [Green Version]
- Mead, C. Analog VLSI and Neutral Systems; Addison-Wesley: Boston, MA, USA, 1989. [Google Scholar]
- Sterling, P.; Laughlin, S. Principles of Neural Design; MIT Press: Cambridge, MA, USA, 2015; pp. 1–542. [Google Scholar]
- Yang, M.; Chien, C.; Delbrück, T.; Liu, S. A 0.5 V 55 μW 64 × 2 Channel Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing. IEEE J. Solid-State Circuits 2016, 51, 2554–2569. [Google Scholar] [CrossRef]
- Jiménez-Fernández, A.; Cerezuela-Escudero, E.; Miró-Amarante, L.; Domínguez-Morales, M.J.; Gomez-Rodríguez, F.; Linares-Barranco, A.; Jiménez-Moreno, G. A Binaural Neuromorphic Auditory Sensor for FPGA: A Spike Signal Processing Approach. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 804–818. [Google Scholar] [CrossRef] [PubMed]
- Lichtsteiner, P.; Posch, C.; Delbrück, T. A 128 × 128 120 dB 15 us Latency Asynchronous Temporal Contrast Vision Sensor. IEEE J. Solid-State Circuits 2008, 43, 566–576. [Google Scholar] [CrossRef] [Green Version]
- Shoushun, C.; Bermak, A. Arbitrated Time-to-First Spike CMOS Image Sensor With On-Chip Histogram Equalization. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2007, 15, 346–357. [Google Scholar] [CrossRef]
- Posch, C.; Matolin, D.; Wohlgenannt, R. A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS. IEEE J. Solid-State Circuits 2011, 46, 259–275. [Google Scholar] [CrossRef]
- Leñero-Bardallo, J.A.; Serrano-Gotarredona, T.; Linares-Barranco, B. A 3.6 μ s Latency Asynchronous Frame-Free Event-Driven Dynamic-Vision-Sensor. IEEE J. Solid-State Circuits 2011, 46, 1443–1455. [Google Scholar] [CrossRef] [Green Version]
- Brandli, C.; Berner, R.; Yang, M.; Liu, S.; Delbruck, T. A 240 × 180 130 dB 3 μs Latency Global Shutter Spatiotemporal Vision Sensor. IEEE J. Solid-State Circuits 2014, 49, 2333–2341. [Google Scholar] [CrossRef]
- Pardo, F.; Boluda, J.A.; Vegara, F. Selective Change Driven Vision Sensor With Continuous-Time Logarithmic Photoreceptor and Winner-Take-All Circuit for Pixel Selection. IEEE J. Solid-State Circuits 2015, 50, 786–798. [Google Scholar] [CrossRef]
- Son, B.; Suh, Y.; Kim, S.; Jung, H.; Kim, J.; Shin, C.; Park, K.; Lee, K.; Park, J.; Woo, J.; et al. 4.1 A 640×480 dynamic vision sensor with a 9 μm pixel and 300 Meps address-event representation. In Proceedings of the 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 5–9 February 2017; pp. 66–67. [Google Scholar]
- Linares-Barranco, A.; Gómez-Rodríguez, F.; Villanueva, V.; Longinotti, L.; Delbrück, T. A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 2417–2420. [Google Scholar]
- Delbruck, T.; Lang, M. Robotic goalie with 3 ms reaction time at 4event-based dynamic vision sensor. Front. Neurosci. 2013, 7, 223. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Linares-Barranco, A.; Perez-Peña, F.; Moeys, D.P.; Gomez-Rodriguez, F.; Jimenez-Moreno, G.; Liu, S.; Delbruck, T. Low Latency Event-Based Filtering and Feature Extraction for Dynamic Vision Sensors in Real-Time FPGA Applications. IEEE Access 2019, 7, 134926–134942. [Google Scholar] [CrossRef]
- Linares-Barranco, A.; Liu, H.; Rios-Navarro, A.; Gomez-Rodriguez, F.; Moeys, D.P.; Delbruck, T. Approaching Retinal Ganglion Cell Modeling and FPGA Implementation for Robotics. Entropy 2018, 20, 475. [Google Scholar] [CrossRef] [Green Version]
- Zhao, B.; Ding, R.; Chen, S.; Linares-Barranco, B.; Tang, H. Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1963–1978. [Google Scholar] [CrossRef] [Green Version]
- Tapiador-Morales, R.; Linares-Barranco, A.; Jimenez-Fernandez, A.; Jimenez-Moreno, G. Neuromorphic LIF Row-by-Row Multiconvolution Processor for FPGA. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 159–169. [Google Scholar]
- Pérez-Carrasco, J.A.; Zhao, B.; Serrano, C.; Acha, B.; Serrano-Gotarredona, T.; Chen, S.; Linares-Barranco, B. Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing—Application to feedforward convnets. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2706–2719. [Google Scholar] [CrossRef] [PubMed]
- Serrano-Gotarredona, T.; Linares-Barranco, B. Poker-DVS and MNIST-DVS. Their History, How They Were Made, and Other Details. Front. Neurosci. 2015, 9, 481. [Google Scholar] [CrossRef] [PubMed]
- Orchard, G.; Jayawant, A.; Cohen, G.K.; Thakor, N. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades. Front. Neurosci. 2015, 9, 437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lagorce, X.; Orchard, G.; Galluppi, F.; Shi, B.E.; Benosman, R.B. Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1346–1359. [Google Scholar] [CrossRef] [PubMed]
- Furber, S.B.; Lester, D.R.; Plana, L.A.; Garside, J.D.; Painkras, E.; Temple, S.; Brown, A.D. Overview of the spinnaker system architecture. IEEE Trans. Comput. 2013, 62, 2454–2467. [Google Scholar] [CrossRef] [Green Version]
- Schmitt, S.; Klähn, J.; Bellec, G.; Grübl, A.; Güttler, M.; Hartel, A.; Hartmann, S.; de Oliveira, D.H.; Husmann, K.; Jeltsch, S.; et al. Neuromorphic hardware in the loop: Training a deep spiking network on the BrainScaleS wafer-scale system. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2227–2234. [Google Scholar]
- Akopyan, F.; Sawawa, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.; et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
- Moradi, S.; Qiao, N.; Stefanini, F.; Indiveri, G. A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs). IEEE Trans. Biomed. Circuits Syst. 2018, 12, 106–122. [Google Scholar] [CrossRef] [Green Version]
- Lin, C.; Wild, A.; Chinya, G.N.; Cao, Y.; Davies, M.; Lavery, D.M.; Wang, H. Programming Spiking Neural Networks on Intel’s Loihi. Computer 2018, 51, 52–61. [Google Scholar] [CrossRef]
- Furber, S. Large-scale neuromorphic computing systems. J. Neural Eng. 2016, 13, 051001. [Google Scholar] [CrossRef]
- Delbrück, T. jAER Open Source Project (2007). Available online: https://github.com/SensorsINI/jaer (accessed on 14 June 2020).
- Maro, J.; Benosman, R. Event-based Gesture Recognition with Dynamic Background Suppression using Smartphone Computational Capabilities. Front. Neurosci. 2020, 14, 275. [Google Scholar] [CrossRef]
- Piromsopa, K.; Arporntewan, C.; Chongstitvatana, P. An FPGA Implementation of a Fixed-Point Square Root Operation. In Proceedings of the International Symposium on Communications and Information Technology, (ISCIT 2001), Chiang Mai, Thailand, 14–16 November 2001; pp. 14–16. [Google Scholar]
- Li, Y.; Chu, W. A new non-restoring square root algorithm and its VLSI implementations. In Proceedings of the International Conference on Computer Design, Austin, TX, USA, 7–9 October 1996; pp. 538–544. [Google Scholar]
- Aimar, A.; Mostafa, H.; Calabrese, E.; Riós-Navarro, A.; Tapiador-Morales, R.; Lungu, I.A.; Milde, M.B.; Corradi, F.; Linares-Barranco, A.; Liu, S.C.; et al. NullHop:A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps. Trans. Neural Netw. Learn. Syst. 2018, 30, 644–656. [Google Scholar] [CrossRef] [Green Version]
- Berner, R.; Delbrück, T.; Civit-Balcells, A.; Linares-Barranco, A. A 5 Meps $100 USB2.0 address-event monitor-sequencer interface. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, 27–30 May 2007; pp. 2451–2454. [Google Scholar]
- Zamarreño-Ramos, C.; Linares-Barranco, A.; Serrano-Gotarredona, T.; Linares-Barranco, B. Multicasting Mesh AER: A Scalable Assembly Approach for Reconfigurable Neuromorphic Structured AER Systems. Application to ConvNets. IEEE Trans. Biomed. Circuits Syst. 2013, 7, 82–102. [Google Scholar] [CrossRef] [Green Version]
- Baby, S.A.; Vinod, B.; Chinni, C.; Mitra, K. Dynamic Vision Sensors for Human Activity Recognition. In Proceedings of the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), Nanjing, China, 26–29 November 2017; pp. 316–321. [Google Scholar]
- Amir, A.; Taba, B.; Berg, D.; Melano, T.; McKinstry, J.; Nolfo, C.D.; Nayak, T.; Andreopoulos, A.; Garreau, G.; Mendoza, M.; et al. A Low Power, Fully Event-Based Gesture Recognition System. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7388–7397. [Google Scholar]
- Camuñas-Mesa, L.A.; Domínguez-Cordero, Y.L.; Linares-Barranco, A.; Serrano-Gotarredona, T.; Linares-Barranco, B. A Configurable Event-Driven Convolutional Node with Rate Saturation Mechanism for Modular ConvNet Systems Implementation. Front. Neurosci. 2018, 12, 63. [Google Scholar] [CrossRef] [Green Version]
Zedboard (xc7020clg482) | Zynq7000 (xc7z100ffg2) | |
---|---|---|
LUT | 8313/53,200 (15.6%) | 8351/277,400 (3%) |
LUTRAM | 2879/17400 (16.5%) | 2872/108,200 (2.6%) |
FF | 5627/106,400 (5.2%) | 6092/54,800 (1.1%) |
DSP | 46/220 (20%) | 46/2020 (2%) |
BRAM | 18/140 (12.8%) | 18/755 (2%) |
Maro et al. [36] | Q8.8 | Q16.16 | Q32.32 | |
---|---|---|---|---|
NavGestures-sit | 94.5% | 93.3% | 93.72% | 94.1% |
Zynq7000 (xc7z100ffg2) | |||
---|---|---|---|
Resolution | Q8.8 | Q16.16 | Q32.32 |
LUT | 3% | 4.22% | 4.68% |
LUTRAM | 0.31% | 0.36% | 1.67% |
FF | 0.34% | 0.38% | 0.48% |
DSP | 2% | 3.23% | 6.92% |
BRAM | 2.12% | 2.12% | 2.12% |
Zedboard (xc7020clg482) | |||
Resolution | Q8.8 | Q16.16 | Q32.32 |
LUT | 15.6% | 16.7% | 22.01% |
LUTRAM | 7.43% | 8.41% | 10.37% |
FF | 1.62% | 1.88% | 2.41% |
DSP | 20.2% | 34.09% | 64.45% |
BRAM | 11.43% | 11.43% | 11.43% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tapiador-Morales, R.; Maro, J.-M.; Jimenez-Fernandez, A.; Jimenez-Moreno, G.; Benosman, R.; Linares-Barranco, A. Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA. Sensors 2020, 20, 3404. https://doi.org/10.3390/s20123404
Tapiador-Morales R, Maro J-M, Jimenez-Fernandez A, Jimenez-Moreno G, Benosman R, Linares-Barranco A. Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA. Sensors. 2020; 20(12):3404. https://doi.org/10.3390/s20123404
Chicago/Turabian StyleTapiador-Morales, Ricardo, Jean-Matthieu Maro, Angel Jimenez-Fernandez, Gabriel Jimenez-Moreno, Ryad Benosman, and Alejandro Linares-Barranco. 2020. "Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA" Sensors 20, no. 12: 3404. https://doi.org/10.3390/s20123404
APA StyleTapiador-Morales, R., Maro, J.-M., Jimenez-Fernandez, A., Jimenez-Moreno, G., Benosman, R., & Linares-Barranco, A. (2020). Event-Based Gesture Recognition through a Hierarchy of Time-Surfaces for FPGA. Sensors, 20(12), 3404. https://doi.org/10.3390/s20123404