Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-981-97-5495-3_25guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

Published: 16 August 2024 Publication History

Abstract

This study addresses the deployment of vision Transformer models on edge computing devices by proposing an FPGA-based hardware-software co-acceleration scheme aimed at solving the problem of the significant divide that exists between the deployment of deep neural network models on edge devices and the supply-demand of hardware computing resources. In this paper, we use a structured hybrid pruning method, which starts with a channel pruning strategy that automatically identifies the impact of parameters in each layer of the model on the results prunes them accordingly, and then performs a Top-k pruning on the saved weight matrix after the first pruning is finished. The method is hardware-friendly and can effectively reduce the model complexity. To adapt to sparse matrix computation, a specific matrix multiplication optimization module is designed in this study. Experimental results show that the scheme achieves 11.52- and 1.56-times improvement in throughput compared to using traditional CPU and GPU platforms, respectively, while maintaining model accuracy. Future research will focus on further trade-offs between model complexity and performance, introducing adaptive mechanisms, and applications in lower-power hardware environments.

References

[1]
Qiu, M., Li, J.: Real-Time Embedded Systems: Optimization, Synthesis, and Networking, CRC Press, Boca Raton (2011)
[2]
Qiu M, Guo M, et al. Loop scheduling and bank type assignment for heterogeneous multi-bank memory JPDC 2009 69 6 546-558
[3]
Cui, Y., Cao, K., et al.: Client scheduling and resource management for efficient training in heterogeneous IoT-edge federated learning. IEEE TCAD (2021)
[4]
Zhang, Y., Qiu, M., Gao, H.: Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: IJCAI (2023)
[5]
Ling, C., Jiang, J., et al.: Deep graph representation learning and optimization for influence maximization. In: ICML (2023)
[6]
Qiu H, Zheng Q, et al. Toward secure and efficient deep learning inference in dependable IoT systems IEEE Internet Things J. 2020 8 5 3180-3188
[7]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[8]
Huang H, Chaturvedi V, et al. Throughput maximization for periodic real-time systems under the maximal temperature constraint ACM TECS 2014 13 2s 1-22
[9]
Song Y, Li Y, Jia L, and Qiu M Retraining strategy-based domain adaption network for intelligent fault diagnosis IEEE Trans. Ind. Inform. 2019 16 9 6163-6171
[10]
Qiu M, Dai W, and Vasilakos A Loop parallelism maximization for multimedia data processing in mobile vehicular clouds IEEE Trans. Cloud Comput. 2016 7 1 250-258
[11]
Qiu M, Ming Z, et al. Phase-change memory optimization for green cloud with genetic algorithm IEEE Trans. Comput. 2015 64 12 3528-3540
[12]
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (2015)
[13]
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)
[14]
Zhang, M., Yu, X., Rong, J., et al.: Repnas: searching for efficient re-parameterizing blocks. arXiv preprint arXiv:210903508 (2021)
[15]
Ding, X., Xia, C., Zhang, X., et al.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:210501883 (2021)
[16]
Cai, H., Gan, C., Wang, T., et al.: Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)
[17]
Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
[18]
Zhu, M., et al.: Vision transformer pruning. arXiv preprint arXiv:2104.08500 (2021)
[19]
Lin, Z., Liu, J.Z., Yang, Z., et al.: Pruning redundant mappings in transformer models via spectral-normalized identity prior. arXiv preprint arXiv:2010.01791 (2020)
[20]
He, H., Liu, J., Pan, Z., et al.: Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021)
[21]
Li, B., Pandey, S., Fang, H., et al.: FTrans: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2020)
[22]
Zhang X, Wu Y, Zhou P, et al. Algorithm-hardware co-design of attention mechanism on FPGA devices ACM Trans. Embed. Comput. Syst. (TECS) 2021 20 5s 1-24
[23]
Gao, G., Li, W., Li, J., et al.: Feature distillation interaction weighting network for lightweight image super-resolution. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 661–669 (2022)
[24]
Fang, G., Ma, X., Song, M., et al.: Depgraph: towards any structural pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101 (2023)
[25]
Xu, Z., Hong, Z., Ding, C., et al.: MobileFaceSwap: a lightweight framework for video face swapping. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2973–2981 (2022)
[26]
Ham, T.J., Jung, S.J., Kim, S., et al.: A^3: accelerating attention mechanisms in neural networks with approximation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341 (2020)
[27]
Cao, S., Zhang, C., Yao, Z., et al.: Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 63–72 (2019)
[28]
Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)
[29]
Park J, Yoon H, Ahn D, et al. OPTIMUS: optimized matrix multiplication structure for transformer neural network accelerator Proc. Mach. Learn. Syst. 2020 2 363-378
[30]
Fan, Z., Hu, W., Liu, F., et al.: A hardware design framework for computer vision models based on reconfigurable devices. ACM Trans. Reconfig. Technol. Syst. (2023)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Knowledge Science, Engineering and Management: 17th International Conference, KSEM 2024, Birmingham, UK, August 16–18, 2024, Proceedings, Part II
Aug 2024
476 pages
ISBN:978-981-97-5494-6
DOI:10.1007/978-981-97-5495-3
  • Editors:
  • Cungeng Cao,
  • Huajun Chen,
  • Liang Zhao,
  • Junaid Arshad,
  • Taufiq Asyhari,
  • Yonghao Wang

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 16 August 2024

Author Tags

  1. Vision Transformer
  2. FPGA
  3. Structured Hybrid Pruning
  4. Hardware acceleration
  5. Sparse matrix optimization
  6. Edge device

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media