Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3020078.3021698acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

Published: 22 February 2017 Publication History

Abstract

OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks.
In this paper, we first propose an analytical performance model and apply it to perform an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs. We identify that the key performance bottleneck is the on-chip memory bandwidth. We propose a new kernel design to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access. As a case study, we further apply these techniques to design a CNN accelerator based on the VGG model. Finally, we evaluate the performance of our CNN accelerator using an Altera Arria 10 GX1150 board. We achieve 866 Gop/s floating point performance at 370MHz working frequency and 1.79 Top/s 16-bit fixed-point performance at 385MHz. To the best of our knowledge, our implementation achieves the best power efficiency and performance density compared to existing work.

References

[1]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM.
[2]
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016.
[3]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, et al., "A reconfigurable fabric for accelerating large-scale datacenter services," in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), IEEE, 2014.
[4]
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, and Y. Cao, "Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016.
[5]
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[6]
Xilinx, "The rise of serial memory andthe future of ddr." http://www.xilinx.com/support/documentation/white_papers/wp456-DDR-serial-mem.pdf, 2015.
[7]
T.S. Czajkowski, D. Neto, M. Kinsner, U. Aydonat, J. Wong, D. Denisenko, P. Yiannacouras, J. Freeman, D. P. Singh, and S. D. Brown, "Opencl for fpgas: Prototyping a compiler," in Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), 2012.
[8]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, 1998.
[9]
A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012.
[10]
S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, pp. 91--99, 2015.
[11]
G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers. Boca Raton, FL, USA: CRC Press, Inc., 1st ed., 2010.
[12]
B. Bosi, G. Bois, and Y. Savaria, "Reconfigurable pipelined 2-d convolvers for fast digital signal processing," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Sept 1999.
[13]
Altera, "Altera sdk for opencl best practices guide," 2016.
[14]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.
[15]
S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, "A dynamically configurable coprocessor for convolutional neural networks," in ACM SIGARCH Computer Architecture News, vol. 38, ACM, 2010.
[16]
S. I. Venieris and C.-S. Bouganis, "fpgaconvnet: A framework for mapping convolutional neural networks on fpgas,"

Cited By

View all
  • (2024)Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering DataflowElectronics10.3390/electronics1307121713:7(1217)Online publication date: 26-Mar-2024
  • (2024)DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLSACM Transactions on Reconfigurable Technology and Systems10.1145/365003817:3(1-32)Online publication date: 5-Mar-2024
  • (2024)Reliable Hardware Watermarks for Deep Learning SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.336024032:4(752-762)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
      February 2017
      312 pages
      ISBN:9781450343541
      DOI:10.1145/3020078
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 February 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. convolutional neural networks
      2. fpga
      3. hardware accelerator
      4. opencl

      Qualifiers

      • Research-article

      Conference

      FPGA '17
      Sponsor:

      Acceptance Rates

      FPGA '17 Paper Acceptance Rate 25 of 101 submissions, 25%;
      Overall Acceptance Rate 125 of 627 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)128
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 20 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering DataflowElectronics10.3390/electronics1307121713:7(1217)Online publication date: 26-Mar-2024
      • (2024)DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLSACM Transactions on Reconfigurable Technology and Systems10.1145/365003817:3(1-32)Online publication date: 5-Mar-2024
      • (2024)Reliable Hardware Watermarks for Deep Learning SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.336024032:4(752-762)Online publication date: Apr-2024
      • (2024)Hybrid Stochastic Number and Its Neural Network ComputationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.333217032:3(432-441)Online publication date: Mar-2024
      • (2024)Design of Fully Spectral CNNs for Efficient FPGA-Based AccelerationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322477935:6(8111-8123)Online publication date: Jun-2024
      • (2024)Dynamic Probabilistic Pruning: A General Framework for Hardware-Constrained Pruning at Different GranularitiesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317680935:1(733-744)Online publication date: Jan-2024
      • (2024)FPGA Codec System of Learned Image Compression With Algorithm-Architecture Co-OptimizationIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.338632814:2(334-347)Online publication date: Jun-2024
      • (2024)A Hierarchical BRAM/URAM Buffer for SNN2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI)10.1109/ICMI60790.2024.10585929(1-5)Online publication date: 13-Apr-2024
      • (2024)FPGAs as Hardware Accelerators in Data Centers: A Survey From the Data Centric Perspective2024 2nd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT)10.1109/DICCT61038.2024.10533053(1-6)Online publication date: 15-Mar-2024
      • (2024)PflowJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103113150:COnline publication date: 1-May-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media