Nothing Special   »   [go: up one dir, main page]

skip to main content

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

Published: 01 September 2023 Publication History


Although advances in event-based machine vision algorithms have demonstrated unparalleled capabilities in performing some of the most demanding tasks, their implementations under stringent real-time and power constraints in edge systems remain a major challenge. In this work, a reconfigurable hardware-software architecture called REMOT, which performs real-time event-based multi-object tracking on FPGAs, is presented. REMOT performs vision tasks by defining a set of actions over attention units (AUs). These actions allow AUs to track an object candidate autonomously by adjusting its region of attention and allow information gathered by each AU to be used for making algorithmic-level decisions. Taking advantage of this modular structure, algorithm-architecture codesign can be performed by implementing different parts of the algorithm in either hardware or software for different tradeoffs. Results show that REMOT can process 0.43–2.91 million events per second at 1.75–5.45 W. Compared with the software baseline, our implementation achieves up to 44 times higher throughput and 35.4 times higher power efficiency. Migrating the Merge operation to hardware further reduces the worst-case latency to be 95 times shorter than the software baseline. By varying the AU configuration and operation, a reduction of 0.59–0.77 mW per AU on the programmable logic has also been demonstrated.


Jyotibdha Acharya, Andres Ussa Caycedo, Vandana Reddy Padala, Rishi Raj Singh Sidhu, Garrick Orchard, Bharath Ramesh, and Arindam Basu. 2019. EBBIOT: A low-complexity tracking algorithm for surveillance in IoVT using stationary neuromorphic vision sensors. In Proceedings of the 32nd IEEE International System-on-Chip Conference (SOCC’19). 318–323.
Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, and Tobi Delbruck. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30, 3 (2019), 644–656.
Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 983–990.
Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2010. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2010), 1619–1632.
Francisco Barranco, Cornelia Fermuller, and Eduardo Ros. 2018. Real-time clustering and multi-target tracking using event-based sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’18), 5764–5769.
Erik Bochinski, Volker Eiselein, and Thomas Sikora. 2017. High-speed tracking-by-detection without using image information. In Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’17), 1–6.
Christian Brandli, Raphael Berner, Minhao Yang, Shih-Chii Liu, and Tobi Delbruck. 2014. A 240\(\times\) 180 130 db 3 \(\mu\)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circ. 49, 10 (2014), 2333–2341.
Haosheng Chen, David Suter, Qiangqiang Wu, and Hanzi Wang. 2020. End-to-end learning of object motion estimation from retinal events for event-based object tracking. Proc. AAAI Conf. Artif. Intell. 34, 07 (2020), 10534–10541.
Haosheng Chen, Qiangqiang Wu, Yanjie Liang, Xinbo Gao, and Hanzi Wang. 2019. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In Proceedings of the 27th ACM International Conference on Multimedia, 473–481.
Gregory K. Cohen, Garrick Orchard, Sio-Hoi Leng, Jonathan Tapson, Ryad B. Benosman, and André Van Schaik. 2016. Skimming digits: Neuromorphic classification of spike-encoded images. Front. Neurosci. 10 (2016), 184.
Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 1 (2018), 82–99.
P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taixé. 2020. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv: 2003.09003. Retrieved from
Martin Dietzfelbinger, Torben Hagerup, Jyrki Katajainen, and Martti Penttonen. 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1 (1997), 19–51.
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96),226–231.
Guillermo Gallego, Tobi Delbruck, Garrick Michael Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew Davison, Jorg Conradt, Kostas Daniilidis, and Davide Scaramuzza. 2020. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020), 1–26.
Yizhao Gao, Song Wang, and Hayden Kwok-Hay So. 2022. REMOT: A hardware-software architecture for attention-guided multi-object tracking with dynamic vision sensors on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’22). Association for Computing Machinery, New York, NY, 158–168.
Daniel Gehrig, Henri Rebecq, Guillermo Gallego, and Davide Scaramuzza. 2020. EKLT: Asynchronous photometric feature tracking using events and frames. Int. J. Comput. Vis. 128, 3 (2020), 601–618.
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361.
Wulfram Gerstner and Werner M. Kistler. 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press.
D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (1993), 850–863.
Rui Jiang, Xiaozheng Mou, Shunshun Shi, Yueyin Zhou, Qinyi Wang, Meng Dong, and Shoushun Chen. 2020. Object tracking on event cameras with offline–online learning. CAAI Trans. Intell. Technol. 5, 3 (2020), 165–171.
Xavier Lagorce, Garrick Orchard, Francesco Galluppi, Bertram E. Shi, and Ryad B. Benosman. 2017. HOTS: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 7 (2017), 1346–1359.
Hongmin Li and Luping Shi. 2019. Robust event-based object tracking combining correlation filter and CNN representation representation. Front. Neurorobot. 13 (2019), 82.
A. Linares-Barranco, F. Gómez-Rodríguez, V. Villanueva, L. Longinotti, and T. Delbrück. 2015. A USB3.0 FPGA event-based filtering and tracking framework for dynamic vision sensors. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’15), 2417–2420.
Alejandro Linares-Barranco, Fernando Perez-Peña, Diederik Paul Moeys, Francisco Gomez-Rodriguez, Gabriel Jimenez-Moreno, Shih-Chii Liu, and Tobi Delbruck. 2019. Low latency event-based filtering and feature extraction for dynamic vision sensors in real-time FPGA applications. IEEE Access 7 (2019), 134926–134942.
Alejandro Linares-Barranco, Antonio Rios-Navarro, Salvador Canas-Moreno, Enrique Piñero-Fuentes, Ricardo Tapiador-Morales, and Tobi Delbruck. 2021. Dynamic vision sensor integration on FPGA-based CNN accelerators for high-speed visual classification. In Proceedings of the International Conference on Neuromorphic Systems, 1–7.
Qianhui Liu, Haibo Ruan, Dong Xing, Huajin Tang, and Gang Pan. 2020. Effective AER object classification using segmented probability-maximization learning in spiking neural networks. Proc. AAAI Conf. Artif. Intell. 34, 02 (2020), 1308–1315.
Jonathon Luiten, Aljos̆a Os̆ep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé, and Bastian Leibe. 2021. HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 2 (2021), 548–578.
Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668–673.
Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, and Davide Scaramuzza. 2017. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM. Int. J. Robot. Res. 36, 2 (2017), 142–149.
Daniel Müllner. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv:1109.2378. Retrieved from
Fernando Cladera Ojeda, Anthony Bisulco, Daniel Kepple, Volkan Isler, and Daniel D. Lee. 2020. On-device event filtering with binary neural networks for pedestrian detection using neuromorphic vision sensors. In Proceedings of the IEEE International Conference on Image Processing (ICIP’20), 3084–3088.
Etienne Perot, Pierre de Tournemire, Davide Nitti, Jonathan Masci, and Amos Sironi. 2020. Learning to detect objects with a 1 megapixel event camera. Adv. Neural Inf. Process. Syst. 33 (2020), 16639–16652.
Zenon W. Pylyshyn and Ron W. Storm. 1988. Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spat. Vis. 3, 3 (1988), 179–197.
Bharath Ramesh, Andrés Ussa, Luca Della Vedova, Hong Yang, and Garrick Orchard. 2018. PCA-RECT: An energy-efficient object detection approach for event cameras. In Proceedings of the Asian Conference on Computer Vision, 434–449.
Alpha Renner, Matthew Evanusa, and Yulia Sandamirskaya. 2019. Event-based attention and tracking on neuromorphic hardware. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’19). 1709–1716.
Rafael Serrano-Gotarredona, Matthias Oster, Patrick Lichtsteiner, Alejandro Linares-Barranco, Rafael Paz-Vicente, Francisco Gómez-Rodríguez, Luis Camuñas-Mesa, Raphael Berner, Manuel Rivas-Pérez, Tobi Delbruck, et al. 2009. CAVIAR: A 45k neuron, 5M synapse, 12G connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking. IEEE Trans. Neural Netw. 20, 9 (2009), 1417–1438.
Amos Sironi, Manuele Brambilla, Nicolas Bourdis, Xavier Lagorce, and Ryad Benosman. 2018. HATS: Histograms of averaged time surfaces for robust event-based object classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).
Ricardo Tapiador-Morales, Jean-Matthieu Maro, Angel Jimenez-Fernandez, Gabriel Jimenez-Moreno, Ryad Benosman, and Alejandro Linares-Barranco. 2020. Event-based gesture recognition through a hierarchy of time-surfaces for FPGA. Sensors 20, 12 (2020), 3404.
Andrés Ussa, Chockalingam Senthil Rajen, Deepak Singla, Jyotibdha Acharya, Gideon Fu Chuanrong, Arindam Basu, and Bharath Ramesh. 2020. A hybrid neuromorphic object tracking and classification framework for real-time systems. arXiv:2007.11404. Retrieved from

Cited By

View all
  • (2025)MEVDT: Multi-modal event-based vehicle detection and tracking datasetData in Brief10.1016/j.dib.2024.11120558(111205)Online publication date: Feb-2025
  • (2024)A Memory-Efficient High-Speed Event-based Object Tracking System2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558212(1-5)Online publication date: 19-May-2024
  • (2024)Event-Based Vision on FPGAs - a Survey2024 27th Euromicro Conference on Digital System Design (DSD)10.1109/DSD64264.2024.00078(541-550)Online publication date: 28-Aug-2024
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 16, Issue 4
December 2023
343 pages
  • Editor:
  • Deming Chen
Issue’s Table of Contents


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023
Online AM: 21 April 2023
Accepted: 04 April 2023
Revised: 03 February 2023
Received: 14 September 2022
Published in TRETS Volume 16, Issue 4

Check for updates

Author Tags

  1. REMOT
  2. Dynamic Vision Sensors
  3. multi-object tracking
  4. event sensors
  5. event camera
  6. hardware/software co-design
  7. attention unit
  8. FPGA
  9. HOTA


  • Research-article

Funding Sources

  • Research Grants Council (RGC) of Hong Kong
  • AI Chip Center for Emerging Smart Systems (ACCESS)


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)14
Reflects downloads up to 27 Feb 2025

Other Metrics


Cited By

View all
  • (2025)MEVDT: Multi-modal event-based vehicle detection and tracking datasetData in Brief10.1016/j.dib.2024.11120558(111205)Online publication date: Feb-2025
  • (2024)A Memory-Efficient High-Speed Event-based Object Tracking System2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558212(1-5)Online publication date: 19-May-2024
  • (2024)Event-Based Vision on FPGAs - a Survey2024 27th Euromicro Conference on Digital System Design (DSD)10.1109/DSD64264.2024.00078(541-550)Online publication date: 28-Aug-2024
  • (2023)Towards Asynchronously Triggered Spiking Neural Network on FPGA for Event-based Vision2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00051(292-293)Online publication date: 12-Dec-2023

View Options

Login options

Full Access

View options


View or Download as a PDF file.



View online with eReader.


Full Text

View this article in Full Text.

Full Text






Share this Publication link

Share on social media