Article

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

Authors:

Fei WangAuthors Info & Claims

Knowledge Science, Engineering and Management: 17th International Conference, KSEM 2024, Birmingham, UK, August 16–18, 2024, Proceedings, Part II

Pages 333 - 345

https://doi.org/10.1007/978-981-97-5495-3_25

Published: 16 August 2024 Publication History

Abstract

This study addresses the deployment of vision Transformer models on edge computing devices by proposing an FPGA-based hardware-software co-acceleration scheme aimed at solving the problem of the significant divide that exists between the deployment of deep neural network models on edge devices and the supply-demand of hardware computing resources. In this paper, we use a structured hybrid pruning method, which starts with a channel pruning strategy that automatically identifies the impact of parameters in each layer of the model on the results prunes them accordingly, and then performs a Top-k pruning on the saved weight matrix after the first pruning is finished. The method is hardware-friendly and can effectively reduce the model complexity. To adapt to sparse matrix computation, a specific matrix multiplication optimization module is designed in this study. Experimental results show that the scheme achieves 11.52- and 1.56-times improvement in throughput compared to using traditional CPU and GPU platforms, respectively, while maintaining model accuracy. Future research will focus on further trade-offs between model complexity and performance, introducing adaptive mechanisms, and applications in lower-power hardware environments.

References

[1]

Qiu, M., Li, J.: Real-Time Embedded Systems: Optimization, Synthesis, and Networking, CRC Press, Boca Raton (2011)

[2]

Qiu M, Guo M, et al. Loop scheduling and bank type assignment for heterogeneous multi-bank memory JPDC 2009 69 6 546-558

Digital Library

[3]

Cui, Y., Cao, K., et al.: Client scheduling and resource management for efficient training in heterogeneous IoT-edge federated learning. IEEE TCAD (2021)

[4]

Zhang, Y., Qiu, M., Gao, H.: Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: IJCAI (2023)

[5]

Ling, C., Jiang, J., et al.: Deep graph representation learning and optimization for influence maximization. In: ICML (2023)

[6]

Qiu H, Zheng Q, et al. Toward secure and efficient deep learning inference in dependable IoT systems IEEE Internet Things J. 2020 8 5 3180-3188

[7]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

[8]

Huang H, Chaturvedi V, et al. Throughput maximization for periodic real-time systems under the maximal temperature constraint ACM TECS 2014 13 2s 1-22

Digital Library

[9]

Song Y, Li Y, Jia L, and Qiu M Retraining strategy-based domain adaption network for intelligent fault diagnosis IEEE Trans. Ind. Inform. 2019 16 9 6163-6171

[10]

Qiu M, Dai W, and Vasilakos A Loop parallelism maximization for multimedia data processing in mobile vehicular clouds IEEE Trans. Cloud Comput. 2016 7 1 250-258

[11]

Qiu M, Ming Z, et al. Phase-change memory optimization for green cloud with genetic algorithm IEEE Trans. Comput. 2015 64 12 3528-3540

Digital Library

[12]

Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (2015)

[13]

Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)

[14]

Zhang, M., Yu, X., Rong, J., et al.: Repnas: searching for efficient re-parameterizing blocks. arXiv preprint arXiv:210903508 (2021)

[15]

Ding, X., Xia, C., Zhang, X., et al.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:210501883 (2021)

[16]

Cai, H., Gan, C., Wang, T., et al.: Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)

[17]

Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)

[18]

Zhu, M., et al.: Vision transformer pruning. arXiv preprint arXiv:2104.08500 (2021)

[19]

Lin, Z., Liu, J.Z., Yang, Z., et al.: Pruning redundant mappings in transformer models via spectral-normalized identity prior. arXiv preprint arXiv:2010.01791 (2020)

[20]

He, H., Liu, J., Pan, Z., et al.: Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021)

[21]

Li, B., Pandey, S., Fang, H., et al.: FTrans: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2020)

[22]

Zhang X, Wu Y, Zhou P, et al. Algorithm-hardware co-design of attention mechanism on FPGA devices ACM Trans. Embed. Comput. Syst. (TECS) 2021 20 5s 1-24

Digital Library

[23]

Gao, G., Li, W., Li, J., et al.: Feature distillation interaction weighting network for lightweight image super-resolution. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 661–669 (2022)

[24]

Fang, G., Ma, X., Song, M., et al.: Depgraph: towards any structural pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101 (2023)

[25]

Xu, Z., Hong, Z., Ding, C., et al.: MobileFaceSwap: a lightweight framework for video face swapping. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2973–2981 (2022)

[26]

Ham, T.J., Jung, S.J., Kim, S., et al.: A^3: accelerating attention mechanisms in neural networks with approximation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341 (2020)

[27]

Cao, S., Zhang, C., Yao, Z., et al.: Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 63–72 (2019)

[28]

Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)

[29]

Park J, Yoon H, Ahn D, et al. OPTIMUS: optimized matrix multiplication structure for transformer neural network accelerator Proc. Mach. Learn. Syst. 2020 2 363-378

[30]

Fan, Z., Hu, W., Liu, F., et al.: A hardware design framework for computer vision models based on reconfigurable devices. ACM Trans. Reconfig. Technol. Syst. (2023)

Index Terms

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
  2. Real-time systems
    1. Real-time system architecture
2. Hardware

Index terms have been assigned to the content through auto-classification.

Recommendations

Hardware-Accelerated Cryptography for Software-Defined Networks with P4
Innovative Security Solutions for Information Technology and Communications
Abstract
The paper presents a hardware-accelerated cryptographic solution for Field Programmable Gate Array (FPGA) based network cards that provide throughput up to 200 Gpbs. Our solution employs a Software-Defined Network (SDN) concept based on the high-...
SAccO

This paper presents SAccO (Scalable Accelerator platform Osnabrück), a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators. Instead of using expensive "reconfigurable ...
Hardware accelerated FPGA placement

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Knowledge Science, Engineering and Management: 17th International Conference, KSEM 2024, Birmingham, UK, August 16–18, 2024, Proceedings, Part II

Aug 2024

476 pages

ISBN:978-981-97-5494-6

DOI:10.1007/978-981-97-5495-3

Editors:
Cungeng Cao
https://ror.org/034t30j35Chinese Academy of Sciences, Beijing, China
,
Huajun Chen
https://ror.org/00a2xv884Zhejiang University, Zhejiang, China
,
Liang Zhao
https://ror.org/03czfpz43Emory University, Atlanta, GA, USA
,
Junaid Arshad
https://ror.org/00t67pt25Birmingham City University, Birmingham, UK
,
Taufiq Asyhari
Monash University, Banten, Indonesia
,
Yonghao Wang
Birmingham City University, Birmingham, UK

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 16 August 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents