research-article

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

Authors:

Sarma Vrudhula,

Jae-sun SeoAuthors Info & Claims

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 8

https://doi.org/10.1145/3240765.3240775

Published: 05 November 2018 Publication History

Abstract

The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3× superior energy-efficiency compared to GPU.

References

[1]

Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL™Deep Learning Accelerator on Arria 10. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).

[2]

M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman. [n. d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.

[3]

Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In IEEE Int. Sym. on Field-Programmable Custom Computing Machines (FCCM). 152–159.

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Jun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv: (2014).

[6]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS).

[7]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Oct. 2016. SSD: Single Shot MultiBox Detector. In European Conference Computer Vision (ECCV).

[8]

Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Int. Conf on Field Programmable Logic and Applications (FPL).

[9]

Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).

[10]

Bert Moons and Marian Verhelst. 2017. An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS. J. Solid-State Circuits (2017).

[11]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).

[12]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conf on Computer Vision and Pattern Recognition (CVPR).

[13]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Dec. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS).

[14]

Dongjoo Shin, Jinmook Lee, Jinsu Lee, and Hoi-Jun Ypp. 2017. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In IEEE Int. Solid-State Circuits Conference (ISSCC).

[15]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv: http://arxiv.org/abs/1409.1556

[16]

Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).

[17]

Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Design Automation Conference (DAC).

[18]

Fisher Yu and Vladlen Koltun. 2015. Multi-Scale Context Aggregation by Dilated Convolutions. CoRR abs/1511.07122 (2015). arXiv: http://arxiv.org/abs/1511.07122

[19]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Int. Conf on Computer-Aided Design (ICCAD).

[20]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).

[21]

Ruizhe Zhao, Xinyu Niu, Yajie Wu, Wayne Luk, and Qiang Liu. 2017. Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms. In Applied Reconfigurable Computing (ARC).

Cited By

Kim YKim HRyu SDev KYoo JMeinerzhagen P(2024)Statues: Energy-Efficient Video Object Detection on Edge Security Devices with Computational SkippingProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670822(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670822
Ruiz-Beltrán CRomero-Garcés AGonzález-García MMarfil RBandera A(2023)FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance SystemElectronics10.3390/electronics1222471312:22(4713)Online publication date: 20-Nov-2023
https://doi.org/10.3390/electronics12224713
Vrbaski VJosic SVranjkovic VTeodorovic PStruharik R(2023)Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based ApplicationsElectronics10.3390/electronics1222455712:22(4557)Online publication date: 7-Nov-2023
https://doi.org/10.3390/electronics12224557
Show More Cited By

Index Terms

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
  2. Very large scale integration design
    1. Application-specific VLSI designs

Index terms have been assigned to the content through auto-classification.

Recommendations

ARM Synthesizable Design with Actel FPGAs: with Mixed-Signal SoC Applications (set 3)
Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

Many-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
Domain-Specific Language for HW/SW Co-design for FPGAs
DSL '09: Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages

This article describes FSMLanguage, a domain-specific language for HW/SW co-design targeting platform FPGAs. Modern platform FPGAs provide a wealth of configurable logic in addition to embedded processors, distributed RAM blocks, and DSP slices in order ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2018

939 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 05 November 2018

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
424
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim YKim HRyu SDev KYoo JMeinerzhagen P(2024)Statues: Energy-Efficient Video Object Detection on Edge Security Devices with Computational SkippingProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670822(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670822
Ruiz-Beltrán CRomero-Garcés AGonzález-García MMarfil RBandera A(2023)FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance SystemElectronics10.3390/electronics1222471312:22(4713)Online publication date: 20-Nov-2023
https://doi.org/10.3390/electronics12224713
Vrbaski VJosic SVranjkovic VTeodorovic PStruharik R(2023)Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based ApplicationsElectronics10.3390/electronics1222455712:22(4557)Online publication date: 7-Nov-2023
https://doi.org/10.3390/electronics12224557
Kang H(2023)SSDLiteX: Enhancing SSDLite for Small Object DetectionApplied Sciences10.3390/app13211200113:21(12001)Online publication date: 3-Nov-2023
https://doi.org/10.3390/app132112001
Suh HMeng JNguyen TKumar VCao YSeo J(2023)Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGAACM Transactions on Reconfigurable Technology and Systems10.1145/358307416:2(1-25)Online publication date: 10-May-2023
https://dl.acm.org/doi/10.1145/3583074
Fan HLiu SQue ZNiu XLuk W(2023)High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating PointIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311630234:8(4473-4487)Online publication date: Aug-2023
https://doi.org/10.1109/TNNLS.2021.3116302
Zhang LLv QGao DZhou XMeng WYang QZhuo C(2023)A fine-grained mixed precision DNN accelerator using a two-stage big–little core RISC-V MCUIntegration10.1016/j.vlsi.2022.10.00688(241-248)Online publication date: Jan-2023
https://doi.org/10.1016/j.vlsi.2022.10.006
Wang AYe YPeng YZhang DYan ZWang D(2023)A Low-Latency Hardware Accelerator for YOLO Object Detection AlgorithmsAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_15(265-278)Online publication date: 8-Nov-2023
https://doi.org/10.1007/978-981-99-7872-4_15
Balasubramaniam APasricha S(2023)Object Detection in Autonomous Cyber-Physical Vehicle Platforms: Status and Open ChallengesMachine Learning and Optimization Techniques for Automotive Cyber-Physical Systems10.1007/978-3-031-28016-0_17(509-523)Online publication date: 2-Sep-2023
https://doi.org/10.1007/978-3-031-28016-0_17
Tesema SBourennane E(2022)Resource- and Power-Efficient High-Performance Object Detection Inference Acceleration Using FPGAElectronics10.3390/electronics1112182711:12(1827)Online publication date: 8-Jun-2022
https://doi.org/10.3390/electronics11121827
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents