Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios

Deng Wenqi¹²,
Yang Zhenhao¹²,
Lu Maohui¹²,
Wang Gai¹²,
Yang JiangPing¹² &
…
Zheng Qilong¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

Included in the following conference series:

International Symposium on Parallel Architecture, Algorithm and Programming

1389 Accesses

Abstract

Recently, deep learning has achieved great success in artificial intelligent, whose superiority also brought new opportunity for the related research in embedded system. This paper focused on optimizing and speeding the convolution computing, the core operation within convolution neural network based on a multi-cluster digital signal processor, BWDSP. By taking advantage of the BWDSP’s architecture and characteristics of convolution computation, a suitable parallel algorithm was designed. Based on features of convolution neural network model structure, an automatic optimization tool for convolution computing with specific arguments was presented as well. The experimental result showed that the parallel algorithm given in this paper is 9.5x faster than GEMM-based algorithm commonly used in GPU and 5.7x faster than the traditional vectorization optimization algorithm. Meanwhile, a comparison was made between the parallel algorithm and tiled-base algorithm widely adopted in system with cache hierarchies, showing that the parallel one could achieve a better performance density of 1.55 times than that of later one, meaning that the work in this paper can make full use of computing resources to make them more efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing Pointwise Convolutions on Multi-core DSPs

Efficient Processing of Convolutional Neural Networks on SW26010

Accelerating Depthwise Separable Convolutions with Vector Processor

References

He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Gu, J., Wang, Z., Kuen, J., et al.: Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108 (2015)
Chen, Y., Luo, T., Liu, S., et al.: Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE Computer Society (2014)
Google Scholar
Cetc38.com.cn: BWDSP Product Presentation. http://www.cetc38.com.cn/38/335804/335809/377610/index.html (2017). Accessed 22 Mar 2017
Cong, J., Xiao, B.: Minimizing computation in convolutional neural networks. In: Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P. (eds.) ICANN 2014. LNCS, vol. 8681, pp. 281–290. Springer, Cham (2014). doi:10.1007/978-3-319-11179-7_36
Google Scholar
Chetlur, S., Woolley, C., Vandermersch, P., et al.: cuDNN: efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Image-net.org: ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/ (2017). Accessed 22 Mar 2017
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Lane, N.D., Bhattacharya, S., Georgiev, P., et al.: Deepx: a software accelerator for low-power deep learning inference on mobile device. In: 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 1–12. IEEE (2016)
Google Scholar
Cavigelli, L., Magno, M., Benini, L.: Accelerating real-time embedded scene labeling with convolutional networks. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2015)
Google Scholar
Hegde, G., Ramasamy, N., Kapre, N.: CaffePresso: an optimized library for deep learning on embedded accelerator-based platforms. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, p. 14. ACM (2016)
Google Scholar
Zhang, C., Li, P., Sun, G., et al.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Google Scholar

Download references

Acknowledgment

This work was supported in part by a grant from China Core Electronic Devices, High-end Generic Chips and Basic Software Major Projects, No. 2012ZX01034-001-001.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Deng Wenqi, Yang Zhenhao, Lu Maohui, Wang Gai, Yang JiangPing & Zheng Qilong

Authors

Deng Wenqi
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhenhao
View author publications
You can also search for this author in PubMed Google Scholar
Lu Maohui
View author publications
You can also search for this author in PubMed Google Scholar
Wang Gai
View author publications
You can also search for this author in PubMed Google Scholar
Yang JiangPing
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Qilong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Qilong .

Editor information

Editors and Affiliations

Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, China
Guoliang Chen
Sun Yat-sen University, Guangzhou, Guangdong, China
Hong Shen
Hainan University, Haikou, Hainan, China
Mingrui Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wenqi, D., Zhenhao, Y., Maohui, L., Gai, W., JiangPing, Y., Qilong, Z. (2017). Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_47

Download citation

DOI: https://doi.org/10.1007/978-981-10-6442-5_47
Published: 06 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6441-8
Online ISBN: 978-981-10-6442-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Pointwise Convolutions on Multi-core DSPs

Efficient Processing of Convolutional Neural Networks on SW26010

Accelerating Depthwise Separable Convolutions with Vector Processor

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Speeding Up Convolution on Multi-cluster DSP in Deep Learning Scenarios

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Pointwise Convolutions on Multi-core DSPs

Efficient Processing of Convolutional Neural Networks on SW26010

Accelerating Depthwise Separable Convolutions with Vector Processor

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation