Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

On the RTL Implementation of FINN Matrix Vector Unit

Published: 09 November 2023 Publication History

Abstract

Field-programmable gate array (FPGA)–based accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their ability to scale performance with increasing degrees of specialization with dataflow architectures or custom data type precision. In order to reduce the barrier for software engineers and data scientists to adopt FPGAs, C++- and OpenCL-based design entries with high-level synthesis (HLS) have been introduced. They provide higher abstraction compared with register-transfer level (RTL)–based design. HLS offers faster development time, better maintainability, and more flexibility in code exploration when evaluating several options for multi-dimension tensors, convolutional layers, or different degrees of parallelism. For this reason, HLS has been adopted by DNN accelerator generation frameworks such as FINN and hls4ml.
In this article, we present an alternative backend library for FINN, leveraging RTL. We investigate and evaluate, across a spectrum of design dimensions, the pros and cons of an RTL-based implementation versus the original HLS variant. We show that for smaller design parameters, RTL produces significantly smaller circuits as compared with HLS. For larger circuits, however, the look-up table (LUT) count of RTL-based design is slightly higher, up to around 15%. On the other hand, HLS consistently requires more flip-flops (FFs; with an orders-of-magnitude difference for smaller designs) and block RAMs (BRAMs; 2× more). This also impacts the critical path delay, with RTL producing significantly faster circuits, up to around 80%. RTL also benefits from at least a 10× reduction in synthesis time. Finally, the results were validated in practice using two real-world use cases, one of a multi-layer perceptron (MLP) used in network intrusion detection and the other a convolution network called ResNet, used in image recognition. Overall, since HLS frameworks code-generate the hardware design, the benefits of the ease in the design entry is less important. As such, the gained benefits in synthesis time together with some design-dependent resource benefits make the RTL abstraction an attractive alternative.

References

[1]
2010. AMBA 4 AXI4-Stream Protocol Specification.
[2]
S. A. Alam and O. Gustafsson. 2016. On the implementation of time-multiplexed frequency-response masking filters. IEEE Trans. Signal Process. 64, 15 (Aug.2016), 3933–3944.
[3]
Tobias Alonso, Lucian Petrica, Mario Ruiz, Jakoba Petri-Koenig, Yaman Umuroglu, Ioannis Stamelos, Elias Koromilas, Michaela Blott, and Kees Vissers. 2021. Elastic-DF: Scaling performance of DNN inference in FPGA clouds through automatic partitioning. ACM Trans. Reconfigurable Technol. Syst. 15, 2, Article 15 (Dec. 2021), 34 pages. DOI:
[4]
Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O’Brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 16 (Dec.2018), 23 pages.
[5]
N. Bruschi, A. Garofalo, F. Conti, G. Tagliavini, and D. Rossi. 2020. Enabling mixed-precision quantized neural networks in extreme-edge devices. In Proceedings of the ACM International Conference on Computing Frontiers (Sicily, Catania, Italy, May 2020). 217–220.
[6]
A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. IEEE Trans. Circuits Syst. II 67, 5 (2020), 871–875.
[7]
Philippe Coussy, Daniel D. Gajski, Michael Meredith, and Andres Takach. 2009. An introduction to high-level synthesis. IEEE Des. Test. Comput. 26, 4 (2009), 8–17.
[8]
T. S. Czajkowski, U. Aydonat, D. Denisenko, J. Freeman, M. Kinsner, D. Neto, J. Wong, P. Yiannacouras, and D. P. Singh. 2012. From OpenCL to high-performance hardware on FPGAs. In Proc. Int. Conf. Field-Programmable Logic Applicat.531–534.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comput. Vision Pattern Recog.248–255.
[10]
Michaela Blott et al.2021. FINN: Dataflow compiler for QNN inference on FPGAs. (2021). https://github.com/xilinx/finn.
[11]
A. Garofalo, M. Rusci, F. Conti, D. Rossi, and L. Benini. 2019. PULP-NN: Accelerating quantized neural networks on parallel ultra-low-power RISC-V processors. Philosophical Trans. Royal Society A: Mathematical, Physical and Eng. Sci. 378 (Dec.2019). Issue 2164.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition, In Proc. IEEE Conf. Comput. Vision Pattern Recog.arXiv preprint arXiv:1512.03385. arXiv:http://arxiv.org/abs/1512.03385v1.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR’16). 770–778.
[14]
Ekawat Homsirikamol and Kris Gaj George. 2017. Toward a new HLS-based methodology for FPGA benchmarking of candidates in cryptographic competitions: The CAESAR contest case study. In Proc. IEEE Int. Conf. Field Programmable Technology.120–127.
[15]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. ACM J. Mach. Learn. Res. 18, 1 (Jan.2017), 6869–6898.
[16]
Intel®. [n.d.]. High Level Synthesis Compiler. Retrieved July 25, 2022 from https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html.
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May2017), 84–90.
[18]
M. Kumm and J. Kappauf. 2018. Advanced compressor tree synthesis for FPGAs. IEEE Trans. Comput.99 (2018), 1–1.
[19]
Grant Martin and Gary Smith. 2009. High-level synthesis: Past, present, and future. IEEE Des. Test. Comput. 26, 4 (2009), 18–25.
[20]
Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems (2012), 31–51.
[21]
Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS’15). 1–6.
[22]
Fahad Bin Muslim, Liang Ma, Mehdi Roozmeh, and Luciano Lavagno. 2017. Efficient FPGA implementation of OpenCL high-performance computing applications via high-level synthesis. IEEE Access 5 (2017), 2747–2762.
[23]
Syed Waqar Nabi and Wim Vanderbauwhede. 2017. FPGA design space exploration for scientific HPC applications using a fast and accurate cost model based on roofline analysis. J. Parallel and Distrib. Comput. 133 (2017), 407–419.
[24]
S. W. Nabi and W. Vanderbauwhede. 2019. Automatic pipelining and vectorization of scientific code for FPGAs. International Journal of Reconfigurable Computing 2019, 7348013 (2019), 12.
[25]
Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 35, 10 (2016), 1591–1604.
[26]
Daniel H. Noronha, Bahar Salehpour, and Steven J. E. Wilton. 2018. LeFlow: Enabling flexible FPGA high-level synthesis of tensorflow deep neural networks. In Proc. International Workshop on FPGAs for Software Programmers. 1–8.
[27]
Alessandro Pappalardo. 2021. Xilinx/brevitas. Retrieved July 25, 2022 from
[28]
Oliver Pell and Vitali Averbukh. 2012. Maximum performance computing with dataflow engines. Computing in Science Engineering 14, 4 (2012), 98–103.
[29]
Kaveena Persand, Andrew Anderson, and David Gregg. 2021. Taxonomy of saliency metrics for channel pruning. IEEE Access 9 (2021), 120110–120126.
[30]
Lucian Petrica, Tobias Alonso, Mairin Kroes, Nicholas Fraser, Sorin Cotofana, and Michaela Blott. 2020. Memory-efficient dataflow inference for deep CNNs on FPGA. In Proc. IEEE Int. Conf. Field Programmable Technology.48–55.
[31]
Thomas B. Preußer. 2017. Generic and universal parallel matrix summation with a flexible compression goal for Xilinx FPGAs. In International Conference on Field Programmable Logic and Applications (FPL’17).
[32]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. 1603.05279:1603.05279v4 [cs.CV].
[33]
Soujanna Sarkar, Shashank Dabral, Praveen K. Tiwari, and Raj S. Mitra. 2009. Lessons and experiences with high-level synthesis. IEEE Des. Test. Comput. 26, 4 (2009), 34–45.
[34]
M. Satyanarayanan. 2017. The emergence of edge computing. Computer 50, 1 (Jan.2017), 30–39.
[35]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556.
[36]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE Conf. Comput. Vision Pattern Recog.1–9.
[37]
Y. Umoroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proc. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA). 65–74.
[38]
Yaman Umuroglu, Yash Akhauri, Nicholas James Fraser, and Michaela Blott. 2020. LogicNets: Co-designed neural networks and circuits for extreme-throughput applications. In Proc. Int. Conf. Field-Programmable Logic Applicat.291–297.
[39]
Yaman Umuroglu, Davide Conficconi, Lahiru Rasnayake, Thomas B. Preußer, and Magnus Själander. 2019. Optimizing bit-serial matrix multiplication for reconfigurable computing. ACM Trans. Reconfigurable Technol. Syst. 12, 3 (Aug. 2019).
[40]
Felix Winterstein, Samuel Bayliss, and George A. Constantinides. 2013. High-level synthesis of dynamic data structures: A case study using Vivado HLS. In Proc. IEEE Int. Conf. Field Programmable Technology.362–365.
[41]
Xilinx. Xilinx Unified Software Development Flatform. Retrieved July 25, 2022 from https://www.xilinx.com/html_docs/xilinx2020_1/vitis_doc/irn1582730075765.html.
[43]
Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2016. Designing energy-efficient convolutional neural networks using energy-aware pruning. arxiv:1611.05128.
[44]
Jieru Zhao, Tingyuan Liang, Sharad Sinha, and Wei Zhang. 2019. Machine learning based routing congestion prediction in FPGA high-level synthesis. In Proc. Design, Automation Test in Europe (DATE’19). 1130–1135.

Cited By

View all
  • (2024)Beam Orbital Parameter Prediction Based on the Deployment of Cascaded Neural Networks at Edge Intelligence Acceleration NodesElectronics10.3390/electronics1321418913:21(4189)Online publication date: 25-Oct-2024
  • (2024)Quantized Neural Network Architecture for Hardware Efficient Real-Time 4K Image Super-Resolution2024 28th International Symposium on VLSI Design and Test (VDAT)10.1109/VDAT63601.2024.10705704(1-5)Online publication date: 1-Sep-2024
  • (2023)Towards Deploying Highly Quantized Neural Networks on FPGA Using Chisel2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00032(161-167)Online publication date: 6-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 6
November 2023
428 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3632298
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 November 2023
Online AM: 14 July 2022
Accepted: 02 July 2022
Revised: 04 May 2022
Received: 29 December 2021
Published in TECS Volume 22, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FINN
  2. convolutional neural network
  3. HLS
  4. RTL
  5. FPGA

Qualifiers

  • Research-article

Funding Sources

  • Science Foundation Ireland
  • European Union’s Horizon 2020 research and innovation programme

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)270
  • Downloads (Last 6 weeks)21
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Beam Orbital Parameter Prediction Based on the Deployment of Cascaded Neural Networks at Edge Intelligence Acceleration NodesElectronics10.3390/electronics1321418913:21(4189)Online publication date: 25-Oct-2024
  • (2024)Quantized Neural Network Architecture for Hardware Efficient Real-Time 4K Image Super-Resolution2024 28th International Symposium on VLSI Design and Test (VDAT)10.1109/VDAT63601.2024.10705704(1-5)Online publication date: 1-Sep-2024
  • (2023)Towards Deploying Highly Quantized Neural Networks on FPGA Using Chisel2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00032(161-167)Online publication date: 6-Sep-2023
  • (2023)A Configurable Mixed-Precision Convolution Processing Unit Generator in Chisel2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS)10.1109/DDECS57882.2023.10139758(128-131)Online publication date: 3-May-2023
  • (2023)A critical review on the state-of-the-art and future prospects of machine learning for Earth observation operationsAdvances in Space Research10.1016/j.asr.2023.02.02571:12(4959-4986)Online publication date: Jun-2023
  • (2023)Development an efficient AXI-interconnect unit between set of customized peripheral devices and an implemented dual-core RISC-V processorThe Journal of Supercomputing10.1007/s11227-023-05304-179:15(17000-17019)Online publication date: 5-May-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media