Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An Efficient CNN Accelerator for Low-Cost Edge Systems

Published: 23 August 2022 Publication History

Abstract

Customized hardware based convolutional neural network (CNN or ConvNet) accelerators have attracted significant attention for applications in a low-cost, edge computing system. However, there is a lack of research that seeks to optimize at both the algorithm and hardware levels simultaneously in resource-constrained FPGA systems. In this paper, we first analyze ConvNet models to find one that is most suitable for a low-cost FPGA implementation. Based on the analysis, we select MobileNetV2 as the backbone of our research due to its hardware-friendly structure. We use a quantized implementation with 4-bit precision and optimize further with a smaller input resolution of 192 × 192 to obtain a 68.8% detection accuracy on ImageNet, which represents only a 3.2% accuracy loss compared to a floating-point model that uses the full input size. We then develop a hardware implementation that uses a low-cost FPGA. To accelerate the depth-wise separable ConvNet and utilize DRAM resources efficiently with parallel processing, we propose a novel scoreboard architecture to dynamically schedule DRAM data requests in order to maintain a high hardware utilization. The number of DSP blocks used is about six times smaller than in prior work. In addition, internal block RAM utilization is approximately nine times more efficient than in prior work. Our proposed design achieves 3.07 frames per second (FPS) on the low-cost and resource constrained FPGA system.

References

[1]
ASUS. 2019. Tinker Edge R. (2019). Retrieved Feb. 2, 2022 from https://tinker-board.asus.com/product/tinker-edge-r.html.
[2]
Mohammadreza Baharani, Ushma Sunil, Kaustubh Manohar, Steven Furgurson, and Hamed Tabkhi. 2021. DeepDive: An integrative algorithm/architecture co-design for deep separable convolutional neural networks. In Proceedings of the 2021 on Great Lakes Symposium on VLSI. 247–252.
[3]
Kunlun Bai. 2019. A Comprehensive Introduction to Different Types of Convolutions in Deep Learning. (2019). Retrieved Feb. 11, 2019 from https://towardsdatascience.com/a-comprehensive-introduction-to-different-types-of-convolutions-in-deep-learning-669281e58215.
[4]
Lin Bai, Yiming Zhao, and Xinming Huang. 2018. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 10 (2018), 1415–1419.
[5]
Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, and Michael Gerndt. 2021. DeepEdgeBench: Benchmarking deep neural networks on edge devices. In 2021 IEEE International Conference on Cloud Engineering (IC2E). 20–30.
[6]
Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post Training 4-Bit Quantization of Convolutional Networks for Rapid-Deployment. Curran Associates Inc., Red Hook, NY, USA.
[7]
Liang Cai, Feng Dong, Ke Chen, Kehua Yu, Wei Qu, and Jianfei Jiang. 2020. An FPGA based heterogeneous accelerator for single shot multibox detector (SSD). In 2020 IEEE 15th International Conference on Solid-State Integrated Circuit Technology (ICSICT). 1–3.
[8]
Jungwook Choi, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan, Zhuo Wang, and Pierce Chuang. 2019. Accurate and efficient 2-bit quantized neural networks. In Proceedings of Machine Learning and Systems 1, A. Talwalkar, V. Smith, and M. Zaharia (Eds.). 348–359. https://proceedings.mlsys.org/paper/2019/file/006f52e9102a8d3be2fe5614f42ba989-Paper.pdf.
[9]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.02830 (2016). arxiv:1602.02830http://arxiv.org/abs/1602.02830.
[10]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. DOI:
[11]
Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter H. Jin, Sicheng Zhao, and Kurt Keutzer. 2018. SqueezeNext: Hardware-aware neural network design. CoRR abs/1803.10615 (2018). arxiv:1803.10615http://arxiv.org/abs/1803.10615.
[12]
Google. 2020. Coral Dev Board. (2020). Retrieved Feb. 2, 2022 from https://coral.ai/products/dev-board/.
[13]
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2019. GhostNet: More features from cheap operations. CoRR abs/1911.11907 (2019). arxiv:1911.11907http://arxiv.org/abs/1911.11907.
[14]
Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, and Deming Chen. 2019. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1–6.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
[16]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[17]
Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, and Quoc Le. 2019. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1314–1324.
[18]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). arxiv:1704.04861http://arxiv.org/abs/1704.04861.
[19]
Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4345–4354.
[20]
Justin Knapheide, Benno Stabernack, and Maximilian Kuhnke. 2020. A high throughput MobileNetV2 FPGA implementation based on a flexible architecture for depthwise separable convolution. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). 277–283.
[21]
Byung Soo Ko. 2018. ImageNet Classification Leaderboard. (2018). Retrieved May 27, 2021 from https://kobiso.github.io/Computer-Vision-Leaderboard/imagenet.html.
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
[23]
Jiawen Liao, Liangwei Cai, Yuan Xu, and Minya He. 2019. Design of accelerator for MobileNet convolutional neural network based on FPGA. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 1392–1396.
[24]
Jian-Hao Luo and Jianxin Wu. 2017. An entropy-based pruning method for CNN Compression. CoRR abs/1706.05791 (2017). arxiv:1706.05791http://arxiv.org/abs/1706.05791.
[25]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. CoRR abs/1807.11164 (2018). arxiv:1807.11164http://arxiv.org/abs/1807.11164.
[26]
Sachin Mehta, Hannaneh Hajishirzi, and Mohammad Rastegari. 2019. DiCENet: Dimension-wise convolutions for efficient networks. CoRR abs/1906.03516 (2019). arxiv:1906.03516http://arxiv.org/abs/1906.03516.
[27]
Chunsheng Mei, Zhenyu Liu, Yue Niu, Xiangyang Ji, Wei Zhou, and Dongsheng Wang. 2017. A 200MHZ [email protected] VGG16 accelerator in Xilinx VX690T. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). 784–788.
[28]
NVIDIA. 2019. Jetson Nano. (2019). Retrieved Feb. 2, 2022 from https://developer.nvidia.com/embedded/jetson-nano-developer-kit.
[29]
Eunhyeok Park and Sungjoo Yoo. 2020. Profit: A novel training method for sub-4-bit mobilenet models. In European Conference on Computer Vision. Springer, 430–446.
[30]
Rasberry Pi. 2019. Raspberry Pi 4 Model B specifications. (2019). Retrieved Feb. 2, 2022 from https://www.raspberrypi.com/products/raspberry-pi-4-model-b/.
[31]
Abhinav Podili, Chi Zhang, and Viktor Prasanna. 2017. Fast and efficient implementation of convolutional neural networks on FPGA. In 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 11–18.
[32]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510–4520.
[33]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2015). arxiv:cs.CV/1409.1556
[34]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. (2019). arxiv:cs.CV/1807.11626
[35]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. CoRR abs/1905.11946 (2019). arxiv:1905.11946http://arxiv.org/abs/1905.11946.
[36]
Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan, and Chu-Song Chen. 2020. Pruning depthwise separable convolutions for mobilenet compression. In 2020 International Joint Conference on Neural Networks (IJCNN). 1–8.
[37]
Xuan Wang, Chao Wang, Jing Cao, Lei Gong, and Xuehai Zhou. 2020. WinoNN: Optimizing FPGA-based convolutional neural network accelerators using sparse Winograd algorithm. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4290–4302.
[38]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2018. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. CoRR abs/1812.03443 (2018). arxiv:1812.03443http://arxiv.org/abs/1812.03443.
[39]
Bichen Wu, Alvin Wan, Xiangyu Yue, Peter H. Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, and Kurt Keutzer. 2017. Shift: A zero FLOP, zero parameter alternative to spatial convolutions. CoRR abs/1711.08141 (2017). arxiv:1711.08141http://arxiv.org/abs/1711.08141.
[40]
Di Wu, Yu Zhang, Xijie Jia, Lu Tian, Tianping Li, Lingzhi Sui, Dongliang Xie, and Yi Shan. 2019. A high-performance CNN processor based on FPGA for mobilenets. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 136–143.
[41]
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2015. Quantized convolutional neural networks for mobile devices. CoRR abs/1512.06473 (2015). arxiv:1512.06473http://arxiv.org/abs/1512.06473.
[42]
Xilinx. 2018. Zynq-7000 SoC Data Sheet: Overview. (2018). Retrieved Jan 2, 2020 from https://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf.
[43]
Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for ConvNet accelerators on embedded FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19). Association for Computing Machinery, New York, NY, USA, 23–32. DOI:
[44]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6848–6856.
[45]
Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. Association for Computing Machinery, New York, NY, USA.
[46]
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid. 2018. Towards effective low-bitwidth convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7920–7928.

Cited By

View all
  • (2025)Research on ZYNQ neural network acceleration method for aluminum surface microdefectsDigital Signal Processing10.1016/j.dsp.2024.104900157(104900)Online publication date: Mar-2025
  • (2024)Image Processing for Smart Agriculture Applications Using Cloud-Fog ComputingSensors10.3390/s2418596524:18(5965)Online publication date: 14-Sep-2024
  • (2024)Optimization of Convolution Operators' Backpropagation for Domestic Accelerators: Convolution Operators: Convolution kernel operation Backpropagation optimization to Enhance Performance and Efficiency in Convolutional Neural NetworkProceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering10.1145/3672758.3672817(358-363)Online publication date: 26-Jan-2024
  • Show More Cited By

Index Terms

  1. An Efficient CNN Accelerator for Low-Cost Edge Systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
    July 2022
    330 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3551651
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 23 August 2022
    Online AM: 26 May 2022
    Accepted: 01 May 2022
    Revised: 01 May 2022
    Received: 01 August 2021
    Published in TECS Volume 21, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional neural network (CNN)
    2. EfficientNet
    3. MobileNet
    4. hardware accelerator
    5. embedded system
    6. FPGA

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)396
    • Downloads (Last 6 weeks)49
    Reflects downloads up to 27 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Research on ZYNQ neural network acceleration method for aluminum surface microdefectsDigital Signal Processing10.1016/j.dsp.2024.104900157(104900)Online publication date: Mar-2025
    • (2024)Image Processing for Smart Agriculture Applications Using Cloud-Fog ComputingSensors10.3390/s2418596524:18(5965)Online publication date: 14-Sep-2024
    • (2024)Optimization of Convolution Operators' Backpropagation for Domestic Accelerators: Convolution Operators: Convolution kernel operation Backpropagation optimization to Enhance Performance and Efficiency in Convolutional Neural NetworkProceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering10.1145/3672758.3672817(358-363)Online publication date: 26-Jan-2024
    • (2024)RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on EdgeIEEE Internet of Things Journal10.1109/JIOT.2024.338683211:14(24831-24845)Online publication date: 15-Jul-2024
    • (2024)FPGA-based UAV and UGV for search and rescue applications: A case studyComputers and Electrical Engineering10.1016/j.compeleceng.2024.109491119(109491)Online publication date: Oct-2024
    • (2023)A High-Performance FPGA-Based Depthwise Separable Convolution AcceleratorElectronics10.3390/electronics1207157112:7(1571)Online publication date: 27-Mar-2023
    • (2023)Overflow-free Compute Memories for Edge AI AccelerationACM Transactions on Embedded Computing Systems10.1145/360938722:5s(1-23)Online publication date: 9-Sep-2023
    • (2023)The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN AcceleratorCircuits, Systems, and Signal Processing10.1007/s00034-023-02331-442:8(4759-4783)Online publication date: 6-Mar-2023
    • (2022)A Resource-Limited FPGA-based MobileNetV3 Accelerator2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPAASC55919.2022.9980265(619-622)Online publication date: 7-Nov-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media