research-article

Public Access

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

Authors:

Michaela Blott,

Thomas B. Preußer,

Nicholas J. Fraser,

Giulio Gambardella,

Kenneth O’brien,

Yaman Umuroglu,

Kees VissersAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 11, Issue 3

Article No.: 16, Pages 1 - 23

https://doi.org/10.1145/3242897

Published: 15 December 2018 Publication History

Abstract

Convolutional Neural Networks have rapidly become the most successful machine-learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations, and model parameters. The resulting scalability in performance, power efficiency, and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool that enables design-space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets, and a specific precision. We introduce formalizations of resource cost functions and performance predictions and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS F1, demonstrating new unprecedented measured throughput at 50 TOp/s on AWS F1 and 5 TOp/s on embedded devices.

References

[1]

ImageNet Large Scale Visual Recognition Challenge (ILSVRC). 2017. Retrieved from http://image-net.org/challenges/talks_2017/ILSVRC2017_overview.pdf.

[2]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467.

[3]

K. Abdelouahab, M. Pelcat, J. Sérot, C. Bourrasset, and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. (2017).

[4]

H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, and F. Pétrot. 2016. Ternary neural networks for resource-efficient AI applications. CoRR abs/1609.00222.

[5]

R. Andri, L. Cavigelli, D. Rossi, and L. Benini. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the ISVLSI. IEEE, 236--241.

[6]

U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. Chiu. 2017. An OpenCL (TM) deep-learning accelerator on Arria 10. CoRR abs/1701.03534.

Digital Library

[7]

C. Baskin, N. Liss, A. Mendelson, and E. Zheltonozhskii. 2017. Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052.

[8]

Doug Burger. 2017. Microsoft Unveils Project Brainwave for Real-Rime AI. Retrieved from https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/.

[9]

Z. Cai, X. He, J. Sun, and N. Vasconcelos. 2017. Deep learning with low precision by half-wave gaussian quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[10]

K. Chellapilla, S. Puri, and P. Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft.

[11]

Y. Chen, T. Chen, Z. Xu, N. Sun, and O. Temam. 2016. DianNao family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112.

Digital Library

[12]

Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the ISCA. IEEE, 367--379.

Digital Library

[13]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training neural networks with weights and activations constrained to +1 or -1. CoRR abs/1602.0 (2016).

[14]

E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), Vol. 1. MIT Press, 1269--1277. http://dl.acm.org/citation.cfm?id=2968826.2968968.

Digital Library

[15]

S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, McKinstry, Timothy Melano, Davis R. Barch, Carmelo Di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha. 2016. Convolutional networks for fast, energy-efficient neuromorphic computing. Proc. Natl. Acad. Sci. 113, 41 (2016), 11441--11446. http://www.pnas.org/content/113/41/11441.

[16]

Benoit Jacob et al. 2017. gemmlowp: A Small Self-Contained Low-Precision GEMM Library. Retrieved from https://github.com/google/gemmlowp.

[17]

C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. 2009. CNP: An FPGA-based processor for convolutional networks. In Proceedings of the IEEE FPL. IEEE, 32--37.

[18]

J. Faraone, N. Fraser, G. Gambardella, P. H. W. Blott, and M. Leong. 2017. Compressing low precision deep neural networks using sparsity-induced regularization in ternary networks. In Proceedings of the ICONIP. Springer, 393--404.

[19]

Julian Faraone, Giulio Gambardella, David Boland, Nicholas J. Fraser, Michaela Blott, and Philip H. W. Leong. 2018. Customizing low-precision deep neural networks For FPGAs.

[20]

N. J. Fraser, Y. Umuroglu, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. Scaling binarized neural networks on reconfigurable logic. In Proceedings of the PARMA-DITAM. 6. Retrieved from

Digital Library

[21]

S. Han, H. Mao, and W. J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).

[22]

S. Han, J. Pool, J. Tran, and W. J. Dally. 2015. Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015).

Digital Library

[23]

G. Hegde, Siddhartha, N. Ramasamy, and N. Kapre. 2016. CaffePresso: An optimized library for deep learning on embedded accelerator-based platforms. In Proceedings of the CASES.

Digital Library

[24]

M. Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the ISSCC. IEEE, 10--14.

[25]

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and < 1 MB model size. CoRR abs/1602.07630 (2016).

[26]

Li Jiao, Cheng Luo, Wei Cao, Xuegong Zhou, and Lingli Wang. 2017. Accelerating low bit-width convolutional neural networks with embedded FPGA. In Proceedings of the FPL. IEEE, 1--4.

[27]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ISCA. ACM, 1--12.

Digital Library

[28]

P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the MICRO. IEEE, 1--12.

Digital Library

[29]

Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. CoRR abs/1601.0 (2016).

[30]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the NIPS. 1097--1105.

Digital Library

[31]

Xilinx Research Labs. 2017. BNN-PYNQ. Retrieved from https://github.com/Xilinx/BNN-PYNQ.

[32]

Xilinx Research Labs. 2017. FINN-R. Retrieved from https://github.com/XilinxDublinLabs/FINN-R.

[33]

Xilinx Research Labs. 2018. QNN-MO-PYNQ. Retrieved from https://github.com/Xilinx/QNN-MO-PYNQ.

[34]

S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086. http://www.sciencedirect.com/science/article/pii/S0925231217315655.

Digital Library

[35]

ARM Limited. 2017. Compute Library. Retrieved from https://developer.arm.com/technologies/compute-library.

[36]

B. Liu, M. Wang, H. Foroosh, M. F. Tappen, and M. Pensky. 2015. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814. Retrieved from

[37]

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Proceedings of the FPL. IEEE, 1--8.

[38]

Y. Ma, Y. Cao, S. Vrudhula, and J. Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the FPGA 2017. ACM, 45--54.

Digital Library

[39]

A. K. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr. 2017. WRPN: Wide reduced-precision networks. CoRR abs/1709.01134.

[40]

J. Misra and I. Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1--3 (2010), 239--255.

Digital Library

[41]

D. Moss, E. Nurvitadhi, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. Leong. 2017. High-performance binary neural networks on the Xeon+ FPGA platform. In Proceedings of the FPL. IEEE.

[42]

H. Nakahara, T. Fujii, and S. Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the FPL. IEEE, 1--4.

[43]

H. Nakahara, H. Yonekawa, T. Fujii, M. Shimoda, and S. Sato. 2017. A demonstration of the GUINNESS: A GUI -based neural network synthesizer for an FPGA. In Proceedings of the FPL. IEEE, 1--1.

[44]

E. Nurvitadhi, D. Sheffield, Jaewoong Sim, A. Mishra, G. Venkatesh, and D. Marr. 2016. Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC. In Proceedings of the FPT. 77--84.

[45]

E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, Y. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the FPGA. ACM.

Digital Library

[46]

K. Ovtcharov, O. Ruwase, J. Kim, J. Fowers, K. Strauss, and E. Chung. 2015. Accelerating deep convolutional neural networks using specialized hardware. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN20Whitepaper.pdf.

[47]

Jinhwan Park and Wonyong Sung. 2016. FPGA-based implementation of deep neural networks using on-chip memory only. In Proceedings of the ICASSP. IEEE, 1011--1015.

Digital Library

[48]

Th. B. Preußer. 2017. Generic and universal parallel matrix summation with a flexible compression goal for xilinx FPGAs. In Proceedings of the FPL.

[49]

A. Prost-Boucle, A. Bourge, F. Pétrot, H. Alemdar, N. Caldwell, and V. Leroy. 2017. Scalable high-performance architecture for convolutional ternary neural networks on FPGA. In Proceedings of the FPL. IEEE.

[50]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016).

[51]

B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G. Wei, and D. Brooks. 2016. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the ISCA. IEEE Press.

Digital Library

[52]

J. Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. Retrieved from http://pjreddie.com/darknet/.

[53]

J. Redmon and A. Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17). 6517--6525.

[54]

H. Sharma, J. Park, E. Amaro, B. Thwaites, P. Kotha, A. Gupta, J. K. Kim, A. Mishra, and H. Esmaeilzadeh. 2016. D<scp>nn</scp>W<scp>eaver</scp>: From high-level deep network models to FPGA acceleration. In Proceedings of the Workshop on Cognitive Architectures.

Digital Library

[55]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.

[56]

Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip H. W. Leong, and Peter Y. K. Cheung. 2018. Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic. In Proceedings of the ARC. ACM, to Appear.

[57]

Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. abs/1511.0.

[58]

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the FPGA. ACM.

Digital Library

[59]

Y. Umuroglu and M. Jahre. 2017. Streamlined deployment for quantized neural networks. arXiv preprint arXiv:1709.04060.

[60]

S. I. Venieris and C. Bouganis. 2016. fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. In Proceedings of the CCM. IEEE, 40--47.

[61]

X. Wei, Peng Yu, C. H. and, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the DAC. ACM, 29.

Digital Library

[62]

S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Digital Library

[63]

Xilinx, Inc. 2017. Zynq-7000 All Programmable SoC Data Sheet: Overview. Xilinx, Inc.

[64]

H. Yonekawa and H. Nakahara. 2017. On-chip memory-based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In Proceedings of the IPDPSW. IEEE, 98--105.

[65]

J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. 2017. Scalpel: Customizing dnn pruning to the underlying hardware parallelism. In Proceedings of the ISCA. ACM, 548--560.

Digital Library

[66]

S. Zagoruyko and N. Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146.

[67]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the ICCAD. IEEE.

Digital Library

[68]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the FPGA. ACM.

Digital Library

[69]

J. Zhang and J. Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the FPGA. 25--34.

Digital Library

[70]

R. Zhao, W. Song, W. Zhang, T. Xing, J. Lin, M. Srivastava, R. Gupta, and Z. Zhang. 2017. Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the FPGA.

Digital Library

[71]

A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. CoRR abs/1702.03044. Retrieved from http://arxiv.org/abs/1702.03044.

[72]

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160.

[73]

C. Zhu, S. Han, H. Mao, and W. J. Dally. 2016. Trained ternary quantization. CoRR abs/1612.01064.

Cited By

Vreča JBiasizzo A(2025)Generating Direct Logic Circuit Implementations of Deeply Quantized Neural Networks Using Chisel4mlElectronics10.3390/electronics1405084914:5(849)Online publication date: 21-Feb-2025
https://doi.org/10.3390/electronics14050849
Tasci MIstanbullu ATumen VKosunalp S(2025)FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAsApplied Sciences10.3390/app1502068815:2(688)Online publication date: 12-Jan-2025
https://doi.org/10.3390/app15020688
Le Blevec HLéonardon MWeithoffer SArzel MPutnam ALi J(2025)FPGA-Oriented Design Space Exploration of a Real-Time Road Scene Semantic Segmentation Deep Neural NetworkProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708862(54-54)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708862
Show More Cited By

Index Terms

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Neural Network Inference in High-Performance Computing: Closing the Gap for FINN based Reconfigurable Accelerators
FPGA '25: Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

In recent years, Neural Networks (NNs) have become one of the most prevailing topics in computers science, both in research and in industry. NNs are used for data analysis, natural language processing, autonomous driving and more. As such, NNs also see ...
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a ...
A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGA
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
Deep neural networks (DNNs) are prevalent for many applications related to classification, prediction and regression. To perform different applications with better performance and accuracy, an optimized network architecture is required, which can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 11, Issue 3

Special Issue on Deep learning on FPGAs

September 2018

187 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3299999

Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 December 2018

Accepted: 01 July 2018

Revised: 01 April 2018

Received: 01 December 2017

Published in TRETS Volume 11, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

290
Total Citations
View Citations
4,684
Total Downloads

Downloads (Last 12 months)1,129
Downloads (Last 6 weeks)138

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vreča JBiasizzo A(2025)Generating Direct Logic Circuit Implementations of Deeply Quantized Neural Networks Using Chisel4mlElectronics10.3390/electronics1405084914:5(849)Online publication date: 21-Feb-2025
https://doi.org/10.3390/electronics14050849
Tasci MIstanbullu ATumen VKosunalp S(2025)FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAsApplied Sciences10.3390/app1502068815:2(688)Online publication date: 12-Jan-2025
https://doi.org/10.3390/app15020688
Le Blevec HLéonardon MWeithoffer SArzel MPutnam ALi J(2025)FPGA-Oriented Design Space Exploration of a Real-Time Road Scene Semantic Segmentation Deep Neural NetworkProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708862(54-54)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708862
Xie YLi ZDiaconu DHandagala SLeeser MLin XNakamura YWang Y(2025)LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network InferenceProceedings of the 30th Asia and South Pacific Design Automation Conference10.1145/3658617.3697687(713-719)Online publication date: 20-Jan-2025
https://dl.acm.org/doi/10.1145/3658617.3697687
Li JZhang CYang WLi HWang XZhao CDu SLiu Y(2025)FPGA-Based Low-Bit and Lightweight Fast Light Field Depth EstimationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.349675133:1(88-101)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TVLSI.2024.3496751
Alsharari MMai SWoods RReaño C(2025)Efficient Integer-Only-Inference of Gradient Boosting Decision Trees on Low-Power DevicesIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.344658272:1(241-253)Online publication date: Jan-2025
https://doi.org/10.1109/TCSI.2024.3446582
Fang HTan YRen AZhuang WHua YQin ZLiu D(2025)DSAV: A Deep Sparse Acceleration Framework for Voxel-Based 3-D Object DetectionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343733444:2(613-626)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TCAD.2024.3437334
Karamimanesh MAbiri EShahsavari MHassanli Kvan Schaik AEshraghian J(2025)Spiking neural networks on FPGA: A survey of methodologies and recent advancementsNeural Networks10.1016/j.neunet.2025.107256186(107256)Online publication date: Jun-2025
https://doi.org/10.1016/j.neunet.2025.107256
Rodrigues Moreira LMoreira RTravençolo BBackes A(2025)Deep learning based image classification for embedded devices: A systematic reviewNeurocomputing10.1016/j.neucom.2025.129402623(129402)Online publication date: Mar-2025
https://doi.org/10.1016/j.neucom.2025.129402
Zhang DCen RPu HWan RWang D(2025)An FPGA-based binary neural network accelerator with enhanced hardware efficiency and data reuseMicroelectronics Journal10.1016/j.mejo.2025.106556156(106556)Online publication date: Feb-2025
https://doi.org/10.1016/j.mejo.2025.106556
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents