Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA

Yaobin Wang^17,19,
Pingping Tang^18,19,
Hong An¹⁹,
Zhiqin Liu¹⁷,
Kun Wang¹⁷ &
…
Yong Zhou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

International Conference on Neural Information Processing

2861 Accesses
1 Citations

Abstract

Graphic Processing Unit (GPU) can achieve remarkable performance for dataset-oriented application such as Back Propagation Network (BPN) under reasonable task decomposition and memory optimization. However, advantages of GPU’s memory architecture are still not fully exploited to parallel BPN. In this paper, we develop and analyze a parallel implementation of a back propagation neural network using CUDA. It focuses on kernels optimization through the use of shared memory and suitable blocks dimensions. The implementation was tested with seven well-known benchmark data sets and the results show promising 33.8x to 64.3x speedups can be realized compared to a sequential implementation on a CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GPU-enabled back-propagation artificial neural network for digit recognition in parallel

Article 10 February 2016

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

Article 12 April 2021

A Generic Neural Network Implementation on GPU and Its Performance Benchmark

References

Suresh, S., Omkar, N., Mani, V.: Parallel implementation of back-propagation algorithm in networks of workstations. IEEE Trans. Parallel Distrib. Syst. (TPDS) 16(1), 24–34 (2005)
Article Google Scholar
Che, S., et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. (JPDC) 68(10), 1370–1380 (2008)
Article Google Scholar
Jang, H., Park, A., Jung, K.: Neural network implementation using CUDA and OpenMP. In: Digital Image Computing: Techniques and Applications (DICTA 2008), pp. 155–161 (2008)
Google Scholar
Lopes, N., Ribeiro, B.: GPU implementation of the multiple back-propagation algorithm. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 449–456. Springer, Heidelberg (2009)
Chapter Google Scholar
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp. 1–8 (2008)
Google Scholar
Furedi, L., Szolgay, P.: CNN model on stream processing platform. In: European Conference on Circuit Theory and Design (ECCTD 2009), pp. 843–846 (2009)
Google Scholar
NVIDIA CUDA Programming Guide 2.3. NVIDIA Corporation (2009)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Readings Cogn. Sci. 1, 399–421 (1985)
Google Scholar
Kavinguy, B.: A Neural Network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx
Lopes, N., Ribeiro, B.: An evaluation of multiple feed-forward networks on GPUs. Int. J. Neural Syst. 21(1), 31–47 (2011)
Article Google Scholar
Xavier, S.-C., Madera-Ramirez, F., Uc-Cetina, V.: Parallel training of a back-propagation neural network using CUDA. In: Ninth IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 307–312 (2010)
Google Scholar
Scanzio, S., Cumani, S., Gemello, R., et al.: Parallel implementation of artificial neural network training for speech recognition. Pattern Recogn. Lett. 31(11), 1302–1309 (2010)
Article Google Scholar
Ryoo, S., et al.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: IEEE Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 73–82. ACM (2008)
Google Scholar
Li, X., Han, W., Liu, G., Hong A., et al.: A speculative HMMER search implementation on GPU. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE Computer Society (2012)
Google Scholar
Sun, T., Hong, A., et al.: CRQ-based fair scheduling on composable multicore architectures. In: 26th ACM SIGARCH International Conference on Supercomputing (ICS), Venice, Italy, pp. 173–184 (2012)
Google Scholar
Liu, G., Han, W., Hong, A., et al.: FlexBFS: A parallelism-aware implementation of breadth-first search on GPU. In: 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, USA (2012)
Google Scholar
Liu, G., Hong, A., et al.: A program behavior study of block cryptography algorithms on GPGPU. In: 2009 International Conference on Frontier of Computer Science and Technology (FCST), pp. 33–39 (2009)
Google Scholar
Yao, P., Hong, A., Wang, Y., et al.: CuHMMer: a load-balanced cpu-gpu cooperative bioinformatics application. In: The 2010 International Conference on High Performance Computing and Simulation (HPCS 2010). IEEE Computer Society Press, Caen, France (2010)
Google Scholar

Download references

Acknowledgement

This work is supported financially by the National Natural Science Foundation of China grants 61202044, the National Basic Research Program of China under contract 2011CB302501, the National Hi-tech Research and Development Program of China under contracts 2012AA010902 and 2012AA010303, the Research Fund of Southwest University of Science and Technology 10zx7119.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, 621010, China
Yaobin Wang, Zhiqin Liu, Kun Wang & Yong Zhou
Department of Material Science and Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
Pingping Tang
Department of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China
Yaobin Wang, Pingping Tang & Hong An

Authors

Yaobin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pingping Tang
View author publications
You can also search for this author in PubMed Google Scholar
Hong An
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaobin Wang .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., Zhou, Y. (2015). Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-26555-1_18
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26554-4
Online ISBN: 978-3-319-26555-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-enabled back-propagation artificial neural network for digit recognition in parallel

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

A Generic Neural Network Implementation on GPU and Its Performance Benchmark

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU-enabled back-propagation artificial neural network for digit recognition in parallel

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

A Generic Neural Network Implementation on GPU and Its Performance Benchmark

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation