Nothing Special   »   [go: up one dir, main page]

Skip to main content

Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios

  • Conference paper
  • First Online:
Parallel Architecture, Algorithm and Programming (PAAP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

  • 1389 Accesses

Abstract

Recently, deep learning has achieved great success in artificial intelligent, whose superiority also brought new opportunity for the related research in embedded system. This paper focused on optimizing and speeding the convolution computing, the core operation within convolution neural network based on a multi-cluster digital signal processor, BWDSP. By taking advantage of the BWDSP’s architecture and characteristics of convolution computation, a suitable parallel algorithm was designed. Based on features of convolution neural network model structure, an automatic optimization tool for convolution computing with specific arguments was presented as well. The experimental result showed that the parallel algorithm given in this paper is 9.5x faster than GEMM-based algorithm commonly used in GPU and 5.7x faster than the traditional vectorization optimization algorithm. Meanwhile, a comparison was made between the parallel algorithm and tiled-base algorithm widely adopted in system with cache hierarchies, showing that the parallel one could achieve a better performance density of 1.55 times than that of later one, meaning that the work in this paper can make full use of computing resources to make them more efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  2. Gu, J., Wang, Z., Kuen, J., et al.: Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108 (2015)

  3. Chen, Y., Luo, T., Liu, S., et al.: Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE Computer Society (2014)

    Google Scholar 

  4. Cetc38.com.cn: BWDSP Product Presentation. http://www.cetc38.com.cn/38/335804/335809/377610/index.html (2017). Accessed 22 Mar 2017

  5. Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). doi:10.1007/978-3-319-11179-7_36

    Google Scholar 

  6. Chetlur, S., Woolley, C., Vandermersch, P., et al.: cuDNN: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)

  7. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  8. LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  9. Image-net.org: ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/ (2017). Accessed 22 Mar 2017

  10. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  11. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  12. Lane, N.D., Bhattacharya, S., Georgiev, P., et al.: Deepx: a software accelerator for low-power deep learning inference on mobile device. In: 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 1–12. IEEE (2016)

    Google Scholar 

  13. Cavigelli, L., Magno, M., Benini, L.: Accelerating real-time embedded scene labeling with convolutional networks. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2015)

    Google Scholar 

  14. Hegde, G., Ramasamy, N., Kapre, N.: CaffePresso: an optimized library for deep learning on embedded accelerator-based platforms. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, p. 14. ACM (2016)

    Google Scholar 

  15. Zhang, C., Li, P., Sun, G., et al.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by a grant from China Core Electronic Devices, High-end Generic Chips and Basic Software Major Projects, No. 2012ZX01034-001-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Qilong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Wenqi, D., Zhenhao, Y., Maohui, L., Gai, W., JiangPing, Y., Qilong, Z. (2017). Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_47

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_47

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics