Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3547276.3548521acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA

Published: 13 January 2023 Publication History

Abstract

Convolutional neural networks (CNNs) have been widely used in different areas. The success of CNNs comes with a huge amount of parameters and computations, and nowaday CNNs still keep moving toward larger structures. Although larger structures often bring about better inference accuracy, the increasing size also slows the inference speed down. Recently, various parameter sparsity methods have been proposed to accelerate CNNs by reducing the number of parameters and computations. Existing sparsity methods could be classified into two categories: unstructured and structured. Unstructured sparsity methods easily cause irregularity and thus have a suboptimal speedup. On the other hand, the structured sparsity methods could keep regularity by pruning the parameters following a certain pattern but result in low sparsity. In this paper, we propose a software/hardware co-design approach to bring local irregular sparsity into CNNs. Benefiting from the local irregularity, we design a row-wise computing engine, RConv Engine, to achieve workload balance and remarkable speedup. The experimental results show that our software/hardware co-design method can achieve a 10.9x speedup than the state-of-the-art methods with a negligible accuracy loss.

References

[1]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556. [Online]. Available: http://arxiv.org/abs/1409.1556
[2]
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, ‘‘You only look once: Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
[3]
K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2015, arXiv:1512.03385. [Online]. Available: https://doi.org/10.48550/arXiv.1512.03385.
[4]
J. Deng, W. Dong, R. Socher, L. Li, K. Li and Li Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput Vis. Pattern Recongnit (CVPR), 2009, pp. 248–255.
[5]
K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang and H. Yang, “Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37(1), 2017, 35-47.
[6]
H. Zeng, R. Chen, C. Zhang, and V. Prasanna, “A Framework for Generating High Throughput CNN Implementations on FPGAs,” In 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2018, 117-126.
[7]
S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015.
[8]
S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
[9]
B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky. “Sparse Convolutional Neural Networks,” in Proc. IEEE Conf. Comput Vis. Pattern Recongnit (CVPR) 2015.
[10]
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, “Scnn: An accelerator for compressed-sparse convolutional neural networks,” in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), June 2017.
[11]
Y. Liang, L. Lu and J. Xie, ‘‘OMNI: A framework for integrating hardware and software optimizations for sparse CNNs,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 40, no. 8, pp. 1648–1661, Aug. 2021.
[12]
X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen, “Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach,” in Proceedings of 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 15–28.
[13]
C. Ding, S. Liao, Y. Wang, Z. Li, N. Liu, Y. Zhuo, C. Wang, X. Qian, Y. Bai, G. Yuan, X. Ma, Y. Zhang, J. Tang, Q. Qiu, X. Lin, and B. Yuan, “CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-circulant Weight Matrices,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 395–408.
[14]
Y. Niu, H. Zeng, A. Srivastava, K. Lakhotia, R. Kannan, Y. Wang, and V. K. Prasanna, “Spec2: Spectral Sparse CNN Accelerator on FPGAs,” 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 195–204, 2019.
[15]
M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Dec 2008.
[16]
L. Lu, J. Xie, R. Huang, J. Zhang, W. Lin, and Y. Liang, “An efficient hardware accelerator for sparse convolutional neural networks on FPGAs,” in 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2019, pp. 17–25.
[17]
C. Zhu, K. Huang, S. Yang, Z. Zhu, H. Zhang, and H. Shen, ‘‘An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs,’’ CoRR, vol. abs/2001.01955, 2020. [Online]. Available: http://arxiv.org/abs/2001.01955
[18]
W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2074–2082.
[19]
Y. He, X. Zhang, J. Sun, “Channel pruning for accelerating very deep neural networks,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1389–1397 (2017)
[20]
P. Singh, V. K. Verma, P. Rai, and V. Namboodiri, “Leveraging filter correlations for deep model compression,” in The IEEE Winter Conference on Applications of Computer Vision (WACV), March 2020.
[21]
J.-H. Luo, J. Wu, and W. Lin. Thinet: A filter level pruning method for deep neural network compression. ICCV, 2017.
[22]
M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: Training deep neural networks with binary weights during propagations, in Advances in neural information processing systems (NeurIPS), 2015, pp. 3123– 3131.
[23]
Y. Umuroglu, Y. Akhauri, N. J. Fraser and M. Blott, “LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications,” In International Conference on Field-Programmable Logic and Applications, 2020.
[24]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, 1998.

Cited By

View all
  • (2024)An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGAConcurrency and Computation: Practice and Experience10.1002/cpe.801136:11Online publication date: 6-Jan-2024

Index Terms

  1. A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing
    August 2022
    233 pages
    ISBN:9781450394451
    DOI:10.1145/3547276
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. convolutional neural network
    2. field-programmable gate arrays (FPGAs)
    3. hardware accelerator
    4. software/hardware co-design
    5. sparsity method

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICPP '22
    ICPP '22: 51st International Conference on Parallel Processing
    August 29 - September 1, 2022
    Bordeaux, France

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGAConcurrency and Computation: Practice and Experience10.1002/cpe.801136:11Online publication date: 6-Jan-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media