research-article

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression

Authors:

Adrien Prost-Boucle,

Frédéric PétrotAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 11, Issue 3

Article No.: 15, Pages 1 - 24

https://doi.org/10.1145/3270764

Published: 12 December 2018 Publication History

Abstract

Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed inference into a memory bound problem. This ascertainment being established, many works have lately focused on minimizing memory accesses, either by enforcing and exploiting sparsity on weights or by using few bits for representing activations and weights, to be able to use ANNs inference in embedded devices. In this work, we detail an architecture dedicated to inference using ternary {−1, 0, 1} weights and activations. This architecture is configurable at design time to provide throughput vs. power trade-offs to choose from. It is also generic in the sense that it uses information drawn for the target technologies (memory geometries and cost, number of available cuts, etc.) to adapt at best to the FPGA resources. This allows to achieve up to 5.2k frames per second per Watt for classification on a VC709 board using approximately half of the resources of the FPGA.

References

[1]

Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the 30th International Joint Conference on Neural Networks. 2547--2554. Retrieved from https://github.com/slide-lig/tnn-train.

[2]

Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2018. YodaNN: An architecture for ultra-low power binary-weight CNN acceleration. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 1 (2018), 48--60.

[3]

Ken Batcher. 1987. Quoted in “Humour the computer”, Andrew Davidson, 1995, MIT Press, p.-40.

[4]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 3123--3131.

Digital Library

[5]

Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Proceedings of the IEEE International Solid-State Circuits Conference. IEEE, 238--239.

[6]

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. 25--30.

Digital Library

[7]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. 243--254.

Digital Library

[8]

Lu Hou, Quanming Yao, and James T. Kwok. 2016. Loss-aware binarization of deep networks. arXiv:1611.01600.

[9]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061.

[10]

Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights -1, 0, and +1. In Proceedings of the IEEE Workshop on Signal Processing Systems.

[11]

Matthew Jacobsen, Dustin Richmond, Matthew Hogains, and Ryan Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Trans. Reconfig. Technol. Syst. 8, 4 (Sept. 2015), 22:23.

Digital Library

[12]

Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition. IEEE, 1462--1467.

Digital Library

[13]

Donald E. Knuth. 1997. Seminumerical algorithms, vol. 2. In The Art of Computer Programming. Addison-Wesley, Reading.

Digital Library

[14]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Toronto University.

[15]

Martin Kumm and Peter Zipf. 2014. Pipelined compressor tree optimization using integer linear programming. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications. 1--8.

[16]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[17]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv:1605.04711.

[18]

Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, and Fengbo Ren. 2017. A 7.663TOPS 8.2W energy-efficient FPGA accelerator for binary convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 290--291.

Digital Library

[19]

Zhiqiang Liu, Yong Dou, Jingfei Jiang, Jinwei Xu, Shijie Li, Yongmei Zhou, and Yingnan Xu. 2017. Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10, 3 (July 2017), 17:1--17:23.

Digital Library

[20]

Duncan J. M. Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2017. High performance binary neural networks on the Xeon+FPGA platform. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.

[21]

Hiroki Nakahara, Tomoya Fujii, and Shimpei Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.

[22]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.

[23]

Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, and Guy Boudoukh. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 5--14.

Digital Library

[24]

Jinhwan Park and Wonyong Sung. 2016. FPGA-based implementation of deep neural networks using on-chip memory only. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1011--1015.

Digital Library

[25]

Ardavan Pedram, Stephen Richardson, Mark Horowitz, Sameh Galal, and Shahar Kvatinsky. 2017. Dark memory and accelerator-rich system optimization in the dark silicon era. IEEE Design Test 34, 2 (2017), 39--50.

[26]

Adrien Prost-Boucle, Alban Bourge, Frédéric Pétrot, Hande Alemdar, Nicholas Caldwell, and Vincent Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.

[27]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 525--542.

[28]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

[29]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2011. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. In Proceedings of the International Joint Conference on Neural Networks.

[30]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261.

[31]

Olivier Temam. 2010. The rebirth of neural networks. Keynote speech. In Proceedings of the International Symposium on Computer Architecture.

Digital Library

[32]

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65--74.

Digital Library

[33]

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Heng Wai Leong, Magnus Jahre, and Kees A. Vissers. 2016. FINN: A framework for fast, scalable binarized neural network inference. arXiv:1612.07119.

Digital Library

[34]

K. Vissers. 2017. A framework for reduced precision neural networks on FPGA. In Proceedings of the 17th International Forum on MPSoC. Retrieved from http://www.mpsoc-forum.org/previous/2017/files/proceedings/Kees_Vissers.pdf.

[35]

Ephrem Wu, Xiaoqian Zhang, David Berman, and Inkeun Cho. 2017. A high-throughput reconfigurable processing array for neural networks. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications.

[36]

Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 15--24.

Digital Library

[37]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. arXiv:1612.01064v3.

[38]

Peter Škoda, Tomislav Lipić, Àgoston Srp, Branka Medved Rogina, Karolj Skala, and Ferenc Vajda. 2011. Implementation framework for artificial neural networks on FPGA. In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’11). 274--278.

Cited By

Hoßfeld KDamsgaard HNurmi JBlott MPreußer T(2024)High-efficiency Compressor Trees for Latest AMD FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/364509717:2(1-32)Online publication date: 10-Feb-2024
https://dl.acm.org/doi/10.1145/3645097
Pinzari ABaumela TAndrade LMartin MCoppola MPétrot F(2024)Accurate and energy efficient ad-hoc neural network for wafer map classificationJournal of Intelligent Manufacturing10.1007/s10845-024-02390-7Online publication date: 1-May-2024
https://doi.org/10.1007/s10845-024-02390-7
Bain NGuizzetti RTaly EOudrhiri APaille BUrard PPétrot F(2023)Quantization Modes for Neural Network Inference: ASIC Implementation Trade-offs2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191784(01-08)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191784
Show More Cited By

Index Terms

High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
    2. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
A Low-Power Deconvolutional Accelerator for Convolutional Neural Network Based Segmentation on FPGA: Abstract Only
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...
Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA
Special Issue on Deep learning on FPGAs

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 11, Issue 3

Special Issue on Deep learning on FPGAs

September 2018

187 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3299999

Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2018

Accepted: 01 August 2018

Revised: 01 July 2018

Received: 01 November 2017

Published in TRETS Volume 11, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Grenoble Alpes Métropole through the Nano2017 Esprit project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hoßfeld KDamsgaard HNurmi JBlott MPreußer T(2024)High-efficiency Compressor Trees for Latest AMD FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/364509717:2(1-32)Online publication date: 10-Feb-2024
https://dl.acm.org/doi/10.1145/3645097
Pinzari ABaumela TAndrade LMartin MCoppola MPétrot F(2024)Accurate and energy efficient ad-hoc neural network for wafer map classificationJournal of Intelligent Manufacturing10.1007/s10845-024-02390-7Online publication date: 1-May-2024
https://doi.org/10.1007/s10845-024-02390-7
Bain NGuizzetti RTaly EOudrhiri APaille BUrard PPétrot F(2023)Quantization Modes for Neural Network Inference: ASIC Implementation Trade-offs2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191784(01-08)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191784
de Dinechin FKumm Mde Dinechin FKumm M(2023)Arithmetic for Deep LearningApplication-Specific Arithmetic10.1007/978-3-031-42808-1_24(707-759)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_24
Véstias MDuarte Rde Sousa JNeto H(2022)Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product UnitsACM Transactions on Reconfigurable Technology and Systems10.1145/354618216:1(1-36)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3546182
Venieris SFernandez-Marques JLane N(2021)unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00027(165-175)Online publication date: May-2021
https://doi.org/10.1109/FCCM51124.2021.00027
Eetha SP.K. SPant VVikram SMody MPurnaprajna M(2021)TileNET: Hardware accelerator for ternary Convolutional Neural NetworksMicroprocessors and Microsystems10.1016/j.micpro.2021.10403983(104039)Online publication date: Jun-2021
https://doi.org/10.1016/j.micpro.2021.104039
Wolf M(2021)Smart Cameras and MPSoCsMulti‐Processor System‐on‐Chip 210.1002/9781119818410.ch9(189-202)Online publication date: 28-Apr-2021
https://doi.org/10.1002/9781119818410.ch9
De Vita ARusso APau DBenedetto LRubino ALicciardo G(2020)A Partially Binarized Hybrid Neural Network System for Low-Power and Resource Constrained Human Activity RecognitionIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2020.301198467:11(3893-3904)Online publication date: Nov-2020
https://doi.org/10.1109/TCSI.2020.3011984
Nakahara HQue ZLuk W(2020)High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00010(1-9)Online publication date: May-2020
https://doi.org/10.1109/FCCM48280.2020.00010
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents