research-article

Efficient Design of Pruned Convolutional Neural Networks on FPGA

Author:

Mário VéstiasAuthors Info & Claims

Journal of Signal Processing Systems, Volume 93, Issue 5

Pages 531 - 544

https://doi.org/10.1007/s11265-020-01606-2

Published: 01 May 2021 Publication History

Abstract

Convolutional Neural Networks (CNNs) have improved several computer vision applications, like object detection and classification, when compared to other machine learning algorithms. Running these models in edge computing devices close to data sources is attracting the attention of the community since it avoids high-latency data communication of private data for cloud processing and permits real-time decisions turning these systems into smart embedded devices. Running these models is computationally very demanding and requires a large amount of memory, which are scarce in edge devices compared to a cloud center. In this paper, we proposed an architecture for the inference of pruned convolutional neural networks in any density FPGAs. A configurable block pruning method is proposed together with an architecture that supports the efficient execution of pruned networks. Also, pruning and batching are studied together to determine how they influence each other. With the proposed architecture, we run the inference of a CNN with an average performance of 322 GOPs for 8-bit data in a XC7Z020 FPGA. The proposed architecture running AlexNet processes 240 images/s in a ZYNQ7020 and 775 images/s in a ZYNQ7045 with only 1.2% accuracy degradation.

References

[1]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, and Fei-Fei LImagenet large scale visual recognition challengeInternational Journal of Computer Vision20151153211-252https://doi.org/10.1007/s11263-015-0816-y

[2]

Cun YL, Jackel LD, Boser B, Denker JS, Graf HP, Guyon I, Henderson D, Howard RE, and Hubbard WHandwritten digit recognition: applications of neural network chips and automatic learningIEEE Communications Magazine1989271141-46https://doi.org/10.1109/35.41400

[3]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (pp. 1097–1105). USA: NIPS’12, Curran Associates Inc.

[4]

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.

[5]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9).

[6]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).

[7]

Véstias, M. (2020). Deep learning on edge: Challenges and trends. In Rodrigues, J. M., Cardoso, P. J., Monteiro, J., & Ramos, C. M. (Eds.) Smart Systems Design, Applications, and Challenges (pp. 23–42): IGI Global.

[8]

Véstias, M. P., Duarte, R. P., deSousa, J. T., & Neto, H. (2018). Lite-cnn: A high-performance architecture to execute cnns in low density fpgas. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications.

[9]

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.

[10]

Gysel, P., Pimentel, J., Motamedi, M., & Ghiasi, S. (2018). Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems.

[11]

Véstias, M. (2020). Processing systems for deep learning inference on edge devices. In Mastorakis, G., Mavromoustakis, C. X., Batalla, J. M., & Pallis, E. (Eds.) Convergence of Artificial Intelligence and the Internet of Things (pp. 213–240). Cham: Springer International Publishing.

[12]

Google: Edge TPU. (2019) https://cloud.google.com/edge-tpu/.

[13]

Coral: EDGE TPU Performance Benchmarks. (2020) https://coral.ai/docs/edgetpu/benchmarks.

[14]

Mário, V., Lopes, J. D., Véstias, M., & deSousa, J. T. (2020). Implementing cnns using a linear array of full mesh cgras. In Rincón, F., Barba, J., So, H. K. H., Diniz, P., & Caba, J. (Eds.) Applied Reconfigurable Computing. Architectures, Tools, and Applications (pp. 288–297). Cham: Springer International Publishing.

[15]

Chakradhar S, Sankaradas M, Jakkula V, and Cadambi SA dynamically configurable coprocessor for convolutional neural networksSIGARCH Comput. Archit. News2010383247-257https://doi.org/10.1145/1816038.1815993

[16]

Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., & Temam, O. (2014). Dadiannao: A machine-learning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609–622).

[17]

Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’15 (pp. 161–170). New York: ACM.

[18]

Liu B, Zou D, Feng L, Feng S, Fu P, and Li J An fpga-based cnn accelerator integrating depthwise separable convolution Electronics 2019 8 3 18

[19]

Rivera-Acosta M, Ortega-Cisneros S, and Rivera J Automatic tool for fast generation of custom convolutional neural networks accelerators for fpga Electronics 2019 8 6 17

[20]

Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., Wang, Y., & Yang, H. (2016). Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’16 (pp. 26–35). New York: ACM.

[21]

Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J. S., & Cao, Y. (2016). Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’16 (pp. 16–25). New York: ACM.

[22]

Qiao Y, Shen J, Xiao T, Yang Q, Wen M, and Zhang CFpga-accelerated deep convolutional neural networks for high throughput and energy efficiencyConcurrency and Computation: Practice and Experience20172920e3850-n/ahttps://doi.org/10.1002/cpe.3850,cpe.3850

[23]

Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, and Xu YThroughput-optimized fpga accelerator for deep convolutional neural networksACM Trans. Reconfigurable Technol. Syst.201710317:1-17:23https://doi.org/10.1145/3079758

[24]

Alwani, M., Chen, H., Ferdman, M., & Milder, P. (2016). Fused-layer cnn accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).

[25]

Shen Y, Ferdman M, and Milder PMaximizing cnn accelerator efficiency through resource partitioningSIGARCH Comput. Archit. News2017452535-547https://doi.org/10.1145/3140659.3080221

[26]

Gonçalves, A., Peres, T., & Véstias, M. (2019). Exploring data bitwidth to run convolutional neural networks in low density fpgas. In Hochberger, C., Nelson, B., Koch, A., Woods, R., & Diniz, P. (Eds.) Applied Reconfigurable Computing (pp. 387–401). Cham: Springer International Publishing.

[27]

Gysel, P., Motamedi, M., & Ghiasi, S. (2016). Hardware-oriented approximation of convolutional neural networks. In Proceedings of the 4th International Conference on Learning Representations.

[28]

Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., & Chen, D. (2018). A design flow of accelerating hybrid extremely low bit-width neural network in embedded fpga. In 28th International Conference on Field-Programmable Logic and Applications.

[29]

Véstias MP, Duarte RP, De Sousa JT, and Neto HC A configurable architecture for running hybrid convolutional neural networks in low-density fpgas IEEE Access 2020 8 107229-107243

[30]

Umuroglu, Y., Fraser, N. J., Gambardella, G., Blott, M., Leong, P., Jahre, M., & Vissers, K. (2017). Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17. (pp. 65–74). New York: ACM.

[31]

Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, arXiv:1510.00149.

[32]

Yu J, Lukefahr A, Palframan D, Dasika G, Das R, and Mahlke SScalpel: Customizing dnn pruning to the underlying hardware parallelismSIGARCH Comput. Archit. News2017452548-560https://doi.org/10.1145/3140659.3080215

[33]

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N. E., & Moshovos, A. (2016). Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (pp. 1–13).

[34]

Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). Eie: Efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (pp. 243–254).

[35]

Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, and Dally WJScnn: An accelerator for compressed-sparse convolutional neural networksSIGARCH Comput. Archit. News201745227-40https://doi.org/10.1145/3140659.3080254

[36]

Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong GeeHock, J., Liew, Y. T., Srivatsan, K., Moss, D., Subhaschandra, S., & Boudoukh, G. (2017). Can fpgas beat gpus in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17. (pp. 5–14). New York: ACM.

[37]

Aimar A, Mostafa H, Calabrese E, Rios-Navarro A, Tapiador-Morales R, Lungu I, Milde MB, Corradi F, Linares-Barranco A, Liu S, and Delbruck TNullhop: A flexible convolutional neural network accelerator based on sparse representations of feature mapsIEEE Transactions on Neural Networks and Learning Systems2019303644-656https://doi.org/10.1109/TNNLS.2018.2852335

[38]

Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., & Chen, Y. (2016). Cambricon-x: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).

[39]

Lu, L., Xie, J., Huang, R., Zhang, J., Lin, W., & Liang, Y. (2019). An efficient hardware accelerator for sparse convolutional neural networks on fpgas. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (pp 17–25).

[40]

Véstias, M. P., Duarte, R. P., deSousa, J. T., & Neto, H. C. (2019). Fast convolutional neural networks in low density fpgas using zero-skipping and weight pruning. Electronics (8), 11.

[41]

Véstias M, Duarte R, Sousa JTD, and Neto H Moving deep learning to the edge Algorithms 2020 13 125

[42]

Venieris, S. I., & Bouganis, C. (2018). fpgaconvnet: Mapping regular and irregular convolutional neural networks on fpgas. IEEE Transactions on Neural Networks and Learning Systems, 1–17.

[43]

Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, and Yang HAngel-eye: A complete design flow for mapping cnn onto embedded fpgaIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems201837135-47https://doi.org/10.1109/TCAD.2017.2705069

[44]

Gong L, Wang C, Li X, Chen H, and Zhou XMaloc: A fully pipelined fpga accelerator for convolutional neural networks with all layers mapped on chipIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems201837112601-2612https://doi.org/10.1109/TCAD.2018.2857078

[45]

Véstias MP, Duarte RP, de Sousa JT, and Neto HC A fast and scalable architecture to run convolutional neural networks in low density fpgas Microprocessors and Microsystems 2020 77 103136

[46]

Peres, T., Gonçalves, A., & Véstias, M. (2019). Faster convolutional neural networks in low density fpgas using block pruning. In Hochberger, C., Nelson, B., Koch, A., Woods, R., & Diniz, P. (Eds.) Applied Reconfigurable Computing (pp. 402–416). Cham: Springer International Publishing.

[47]

Struharik RJR, Vukobratović BZ, Erdeljan AM, and Rakanović DM Conna-hardware accelerator for compressed convolutional neural networks Microprocessors and Microsystems 2020 73 102991

[48]

Véstias, M. (2021). Convolutional neural network. In Khosrow-Pour, D. B. A. M. (Ed.) Encyclopedia of Information Science and Technology, Fifth Edition (pp. 12–26): IGI Global.

[49]

Wang, Y., Xu, J., Han, Y., Li, H., & Li, X. (2016). Deepburning: Automatic generation of fpga-based learning accelerators for the neural network family. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) (pp. 1–6).

[50]

Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., Mishra, A., & Esmaeilzadeh, H. (2016). From high-level deep neural models to fpgas. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (pp. 1–12).

[51]

Zhang M, Li L, Wang H, Liu Y, Qin H, and Zhao WOptimized compression for implementing convolutional neural networks on fpgaElectronics201983295https://doi.org/10.3390/electronics8030295

Cited By

Liu L(2022)Sports Video Motion Direction Detection and Target Tracking Algorithm Based on Convolutional Neural NetworkWireless Communications & Mobile Computing10.1155/2022/57607582022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/5760758
Véstias MDuarte Rde Sousa JNeto H(2022)Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product UnitsACM Transactions on Reconfigurable Technology and Systems10.1145/354618216:1(1-36)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3546182

Recommendations

Designing efficient accelerator of depthwise separable convolutional neural network on FPGA
Abstract
In recent years, convolutional neural networks (CNNs) have achieved state-of-the-art results for many computer vision tasks. However, the traditional CNNs are computational-intensive and memory-intensive, hence they are unsuitable for ...
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Signal Processing Systems

Journal of Signal Processing Systems Volume 93, Issue 5

May 2021

139 pages

ISSN:1939-8018

EISSN:1939-8115

Issue’s Table of Contents

© Springer Science+Business Media, LLC, part of Springer Nature 2020.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2021

Accepted: 08 October 2020

Revision received: 21 April 2020

Received: 21 April 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu L(2022)Sports Video Motion Direction Detection and Target Tracking Algorithm Based on Convolutional Neural NetworkWireless Communications & Mobile Computing10.1155/2022/57607582022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/5760758
Véstias MDuarte Rde Sousa JNeto H(2022)Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product UnitsACM Transactions on Reconfigurable Technology and Systems10.1145/354618216:1(1-36)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3546182

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents