research-article

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Authors:

Jason CongAuthors Info & Claims

FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 161 - 170

https://doi.org/10.1145/2684746.2689060

Published: 22 February 2015 Publication History

Abstract

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

References

[1]

D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M. Berin, and C. Eugenio. Accelerating deep neural networks on mobile processor with embedded programmable logic. In NIPS 2013. IEEE, 2013.

[2]

S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273--284. ACM, 2010.

Digital Library

[3]

S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, volume 38, pages 247--257. ACM, 2010.

Digital Library

[4]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGPLAN Not., 49(4):269--284, Feb. 2014.

Digital Library

[5]

J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In Artificial Neural Networks and Machine Learning - ICANN 2014, pages 281--290. Springer, 2014.

[6]

C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32--37. IEEE, 2009.

[7]

Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.

[8]

S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221--231, Jan. 2013.

Digital Library

[9]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.

Digital Library

[10]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 473--480, New York, NY, USA, 2007. ACM.

Digital Library

[11]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[12]

M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13--19. IEEE, 2013.

[13]

L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 29--38, New York, NY, USA, 2013. ACM.

Digital Library

[14]

M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53--60. IEEE, 2009.

Digital Library

[15]

S. Williams, A. Waterman, and D. Patterson. Rooine: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, Apr. 2009.

Digital Library

[16]

W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 9--18, New York, NY, USA, 2013. ACM.

Digital Library

Cited By

Li QXiao JWei J(2024)GRMD: A Two-Stage Design Space Exploration Strategy for Customized RNN AcceleratorsSymmetry10.3390/sym1611154616:11(1546)Online publication date: 19-Nov-2024
https://doi.org/10.3390/sym16111546
Xu YLuo JSun W(2024)Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable StructureSensors10.3390/s2407223924:7(2239)Online publication date: 31-Mar-2024
https://doi.org/10.3390/s24072239
Vasile CUlmămei ABîră C(2024)Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware ApproachesJournal of Imaging10.3390/jimaging1012029810:12(298)Online publication date: 21-Nov-2024
https://doi.org/10.3390/jimaging10120298
Show More Cited By

Index Terms

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Special Issue on Frontiers of Hardware and Algorithms for On-chip Learning, Special Issue on Silicon Photonics and Regular Papers

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have received attention due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. ...
Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks
ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Convolutional neural network (CNN) have significantly advanced image classification, video processing, and pattern recognition. Compared to other hardware deployment platforms, field programmable gate arrays (FPGAs) offer advantages such as ...
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2015

292 pages

ISBN:9781450333153

DOI:10.1145/2684746

General Chair:
George A. Constantinides
Imperial College
,
Program Chair:
Deming Chen
University of Illinois at Urbana-Champaign

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

C-FAR
NSF China
National High Technology Research and Development Program of China
RFDP

Conference

FPGA '15

Sponsor:

SIGDA

FPGA '15: The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 22 - 24, 2015

California, Monterey, USA

Acceptance Rates

FPGA '15 Paper Acceptance Rate 20 of 102 submissions, 20%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,412
Total Citations
View Citations
23,148
Total Downloads

Downloads (Last 12 months)2,162
Downloads (Last 6 weeks)307

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li QXiao JWei J(2024)GRMD: A Two-Stage Design Space Exploration Strategy for Customized RNN AcceleratorsSymmetry10.3390/sym1611154616:11(1546)Online publication date: 19-Nov-2024
https://doi.org/10.3390/sym16111546
Xu YLuo JSun W(2024)Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable StructureSensors10.3390/s2407223924:7(2239)Online publication date: 31-Mar-2024
https://doi.org/10.3390/s24072239
Vasile CUlmămei ABîră C(2024)Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware ApproachesJournal of Imaging10.3390/jimaging1012029810:12(298)Online publication date: 21-Nov-2024
https://doi.org/10.3390/jimaging10120298
Ibrahim MUsman MLee J(2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
https://doi.org/10.3390/electronics13101893
Cheng XWang YDing WLou HLi P(2024)Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering DataflowElectronics10.3390/electronics1307121713:7(1217)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071217
Medina HFarmer CLiu I(2024)Dielectric Elastomer-Based Actuators: A Modeling and Control Review for Non-ExpertsActuators10.3390/act1304015113:4(151)Online publication date: 17-Apr-2024
https://doi.org/10.3390/act13040151
Wang HLi DIsshiki T(2024)A Low-Power Reconfigurable DNN Accelerator for Instruction-Extended RISC-VIPSJ Transactions on System and LSI Design Methodology10.2197/ipsjtsldm.17.5517(55-66)Online publication date: 2024
https://doi.org/10.2197/ipsjtsldm.17.55
Zhang HChen MNi MChen LZhang YHao X(2024)Automated Feature Map Padding and Transfer Circuit for CNN InferenceIEICE Electronics Express10.1587/elex.21.20240559Online publication date: 2024
https://doi.org/10.1587/elex.21.20240559
Kang CPark CLee MKang JJang MChung H(2024)Large-scale photonic inverse design: computational challenges and breakthroughsNanophotonics10.1515/nanoph-2024-0127Online publication date: 7-Jun-2024
https://doi.org/10.1515/nanoph-2024-0127
Li YLiu YXu KLiang JYan YYou HGuo JChen ZMa YLiu XKim JOuyang WTao D(2024)Design and Implementation of IP Operator Library as Backend of Neural Network on ZYNQ FPGAProceedings of the 1st International Workshop on Efficient Multimedia Computing under Limited10.1145/3688863.3689573(3-7)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688863.3689573
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents