Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

Published: 15 December 2018 Publication History

Abstract

Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.

References

[1]
ImageNet Large Scale Visual Recognition Challenge (ILSVRC). 2017. Retrieved from http://image-net.org/challenges/talks_2017/ILSVRC2017_overview.pdf.
[2]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467.
[3]
K. Abdelouahab, M. Pelcat, J. Sérot, C. Bourrasset, and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. (2017).
[4]
H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, and F. Pétrot. 2016. Ternary neural networks for resource-efficient AI applications. CoRR abs/1609.00222.
[5]
R. Andri, L. Cavigelli, D. Rossi, and L. Benini. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the ISVLSI. IEEE, 236--241.
[6]
U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. Chiu. 2017. An OpenCL (TM) deep-learning accelerator on Arria 10. CoRR abs/1701.03534.
[7]
C. Baskin, N. Liss, A. Mendelson, and E. Zheltonozhskii. 2017. Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052.
[8]
Doug Burger. 2017. Microsoft Unveils Project Brainwave for Real-Rime AI. Retrieved from https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/.
[9]
Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave gaussian quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[10]
K. Chellapilla, S. Puri, and P. Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.
[11]
Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam. 2016. DianNao family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112.
[12]
Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the ISCA. IEEE, 367--379.
[13]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.0 (2016).
[14]
E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Vol. 1. MIT Press, 1269--1277. http://dl.acm.org/citation.cfm?id=2968826.2968968.
[15]
S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, McKinstry, Timothy Melano, Davis R. Barch, Carmelo Di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proc. Natl. Acad. Sci. 113, 41 (2016), 11441--11446. http://www.pnas.org/content/113/41/11441.
[16]
Benoit Jacob et al. 2017. gemmlowp: A Small Self-Contained Low-Precision GEMM Library. Retrieved from https://github.com/google/gemmlowp.
[17]
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. 2009. CNP: An FPGA-based processor for convolutional networks. In Proceedings of the IEEE FPL. IEEE, 32--37.
[18]
J. Faraone, N. Fraser, G. Gambardella, P. H. W. Blott, and M. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the ICONIP. Springer, 393--404.
[19]
Julian Faraone, Giulio Gambardella, David Boland, Nicholas J. Fraser, Michaela Blott, and Philip H. W. Leong. 2018. Customizing low-precision deep neural networks For FPGAs.
[20]
N. J. Fraser, Y. Umuroglu, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the PARMA-DITAM. 6. Retrieved from
[21]
S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).
[22]
S. Han, J. Pool, J. Tran, and W. J. Dally. 2015. Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015).
[23]
G. Hegde, Siddhartha, N. Ramasamy, and N. Kapre. 2016. CaffePresso: An optimized library for deep learning on embedded accelerator-based platforms. In Proceedings of the CASES.
[24]
M. Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the ISSCC. IEEE, 10--14.
[25]
F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and < 1 MB model size. CoRR abs/1602.07630 (2016).
[26]
Li Jiao, Cheng Luo, Wei Cao, Xuegong Zhou, and Lingli Wang. 2017. Accelerating low bit-width convolutional neural networks with embedded FPGA. In Proceedings of the FPL. IEEE, 1--4.
[27]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ISCA. ACM, 1--12.
[28]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the MICRO. IEEE, 1--12.
[29]
Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. CoRR abs/1601.0 (2016).
[30]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the NIPS. 1097--1105.
[31]
Xilinx Research Labs. 2017. BNN-PYNQ. Retrieved from https://github.com/Xilinx/BNN-PYNQ.
[32]
Xilinx Research Labs. 2017. FINN-R. Retrieved from https://github.com/XilinxDublinLabs/FINN-R.
[33]
Xilinx Research Labs. 2018. QNN-MO-PYNQ. Retrieved from https://github.com/Xilinx/QNN-MO-PYNQ.
[34]
S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086. http://www.sciencedirect.com/science/article/pii/S0925231217315655.
[35]
ARM Limited. 2017. Compute Library. Retrieved from https://developer.arm.com/technologies/compute-library.
[36]
B. Liu, M. Wang, H. Foroosh, M. F. Tappen, and M. Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814. Retrieved from
[37]
Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Proceedings of the FPL. IEEE, 1--8.
[38]
Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the FPGA 2017. ACM, 45--54.
[39]
A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. CoRR abs/1709.01134.
[40]
J. Misra and I. Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1--3 (2010), 239--255.
[41]
D. Moss, E. Nurvitadhi, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. Leong. 2017. High-performance binary neural networks on the Xeon+ FPGA platform. In Proceedings of the FPL. IEEE.
[42]
H. Nakahara, T. Fujii, and S. Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the FPL. IEEE, 1--4.
[43]
H. Nakahara, H. Yonekawa, T. Fujii, M. Shimoda, and S. Sato. 2017. A demonstration of the GUINNESS: A GUI -based neural network synthesizer for an FPGA. In Proceedings of the FPL. IEEE, 1--1.
[44]
E. Nurvitadhi, D. Sheffield, Jaewoong Sim, A. Mishra, G. Venkatesh, and D. Marr. 2016. Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the FPT. 77--84.
[45]
E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, Y. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the FPGA. ACM.
[46]
K. Ovtcharov, O. Ruwase, J. Kim, J. Fowers, K. Strauss, and E. Chung. 2015. Accelerating deep convolutional neural networks using specialized hardware. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf.
[47]
Jinhwan Park and Wonyong Sung. 2016. FPGA-based implementation of deep neural networks using on-chip memory only. In Proceedings of the ICASSP. IEEE, 1011--1015.
[48]
Th. B. Preußer. 2017. Generic and universal parallel matrix summation with a flexible compression goal for xilinx FPGAs. In Proceedings of the FPL.
[49]
A. Prost-Boucle, A. Bourge, F. Pétrot, H. Alemdar, N. Caldwell, and V. Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the FPL. IEEE.
[50]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016).
[51]
B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G. Wei, and D. Brooks. 2016. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the ISCA. IEEE Press.
[52]
J. Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. Retrieved from http://pjreddie.com/darknet/.
[53]
J. Redmon and A. Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). 6517--6525.
[54]
H. Sharma, J. Park, E. Amaro, B. Thwaites, P. Kotha, A. Gupta, J. K. Kim, A. Mishra, and H. Esmaeilzadeh. 2016. D<scp>nn</scp>W<scp>eaver</scp>: From high-level deep network models to FPGA acceleration. In Proceedings of the Workshop on Cognitive Architectures.
[55]
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
[56]
Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip H. W. Leong, and Peter Y. K. Cheung. 2018. Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic. In Proceedings of the ARC. ACM, to Appear.
[57]
Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. abs/1511.0.
[58]
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the FPGA. ACM.
[59]
Y. Umuroglu and M. Jahre. 2017. Streamlined deployment for quantized neural networks. arXiv preprint arXiv:1709.04060.
[60]
S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In Proceedings of the CCM. IEEE, 40--47.
[61]
X. Wei, Peng Yu, C. H. and, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the DAC. ACM, 29.
[62]
S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.
[63]
Xilinx, Inc. 2017. Zynq-7000 All Programmable SoC Data Sheet: Overview. Xilinx, Inc.
[64]
H. Yonekawa and H. Nakahara. 2017. On-chip memory-based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In Proceedings of the IPDPSW. IEEE, 98--105.
[65]
J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. 2017. Scalpel: Customizing dnn pruning to the underlying hardware parallelism. In Proceedings of the ISCA. ACM, 548--560.
[66]
S. Zagoruyko and N. Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146.
[67]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the ICCAD. IEEE.
[68]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA. ACM.
[69]
J. Zhang and J. Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the FPGA. 25--34.
[70]
R. Zhao, W. Song, W. Zhang, T. Xing, J. Lin, M. Srivastava, R. Gupta, and Z. Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the FPGA.
[71]
A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. CoRR abs/1702.03044. Retrieved from http://arxiv.org/abs/1702.03044.
[72]
S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160.
[73]
C. Zhu, S. Han, H. Mao, and W. J. Dally. 2016. Trained ternary quantization. CoRR abs/1612.01064.

Cited By

View all
  • (2024)Stochastic Computing Convolution Neural Network  Architecture Reinvented For Highly Efficient Artificial Intelligence Workload on Field Programmable Gate ArrayResearch10.34133/research.0307Online publication date: 8-Jan-2024
  • (2024)Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation FunctionSensors10.3390/s2406182924:6(1829)Online publication date: 13-Mar-2024
  • (2024)LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNextMicromachines10.3390/mi1510126515:10(1265)Online publication date: 17-Oct-2024
  • Show More Cited By

Index Terms

  1. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 3
    Special Issue on Deep learning on FPGAs
    September 2018
    187 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/3299999
    • Editor:
    • Steve Wilton
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 December 2018
    Accepted: 01 July 2018
    Revised: 01 April 2018
    Received: 01 December 2017
    Published in TRETS Volume 11, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FINN
    2. FPGA
    3. Neural network
    4. artificial intelligence
    5. convolutional neural networks
    6. hardware accellerator
    7. inference
    8. quantized neural networks

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,097
    • Downloads (Last 6 weeks)146
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Stochastic Computing Convolution Neural Network  Architecture Reinvented For Highly Efficient Artificial Intelligence Workload on Field Programmable Gate ArrayResearch10.34133/research.0307Online publication date: 8-Jan-2024
    • (2024)Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation FunctionSensors10.3390/s2406182924:6(1829)Online publication date: 13-Mar-2024
    • (2024)LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNextMicromachines10.3390/mi1510126515:10(1265)Online publication date: 17-Oct-2024
    • (2024)Efficient FPGA Binary Neural Network Architecture for Image Super-ResolutionElectronics10.3390/electronics1302026613:2(266)Online publication date: 6-Jan-2024
    • (2024)MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546779(1-6)Online publication date: 25-Mar-2024
    • (2024)Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUsWorkshop Proceedings of the 53rd International Conference on Parallel Processing10.1145/3677333.3678153(58-67)Online publication date: 12-Aug-2024
    • (2024)End-to-end codesign of Hessian-aware quantized neural networks for FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/366200017:3(1-22)Online publication date: 11-May-2024
    • (2024)High-efficiency Compressor Trees for Latest AMD FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/364509717:2(1-32)Online publication date: 10-Feb-2024
    • (2024)LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy PhysicsACM Transactions on Embedded Computing Systems10.1145/364046423:2(1-28)Online publication date: 15-Jan-2024
    • (2024)An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNsACM Transactions on Architecture and Code Optimization10.1145/363982321:2(1-26)Online publication date: 8-Jan-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media