Abstract
Graphic Processing Unit (GPU) can achieve remarkable performance for dataset-oriented application such as Back Propagation Network (BPN) under reasonable task decomposition and memory optimization. However, advantages of GPU’s memory architecture are still not fully exploited to parallel BPN. In this paper, we develop and analyze a parallel implementation of a back propagation neural network using CUDA. It focuses on kernels optimization through the use of shared memory and suitable blocks dimensions. The implementation was tested with seven well-known benchmark data sets and the results show promising 33.8x to 64.3x speedups can be realized compared to a sequential implementation on a CPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Suresh, S., Omkar, N., Mani, V.: Parallel implementation of back-propagation algorithm in networks of workstations. IEEE Trans. Parallel Distrib. Syst. (TPDS) 16(1), 24–34 (2005)
Che, S., et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. (JPDC) 68(10), 1370–1380 (2008)
Jang, H., Park, A., Jung, K.: Neural network implementation using CUDA and OpenMP. In: Digital Image Computing: Techniques and Applications (DICTA 2008), pp. 155–161 (2008)
Lopes, N., Ribeiro, B.: GPU implementation of the multiple back-propagation algorithm. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 449–456. Springer, Heidelberg (2009)
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp. 1–8 (2008)
Furedi, L., Szolgay, P.: CNN model on stream processing platform. In: European Conference on Circuit Theory and Design (ECCTD 2009), pp. 843–846 (2009)
NVIDIA CUDA Programming Guide 2.3. NVIDIA Corporation (2009)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Readings Cogn. Sci. 1, 399–421 (1985)
Kavinguy, B.: A Neural Network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx
Lopes, N., Ribeiro, B.: An evaluation of multiple feed-forward networks on GPUs. Int. J. Neural Syst. 21(1), 31–47 (2011)
Xavier, S.-C., Madera-Ramirez, F., Uc-Cetina, V.: Parallel training of a back-propagation neural network using CUDA. In: Ninth IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 307–312 (2010)
Scanzio, S., Cumani, S., Gemello, R., et al.: Parallel implementation of artificial neural network training for speech recognition. Pattern Recogn. Lett. 31(11), 1302–1309 (2010)
Ryoo, S., et al.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: IEEE Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 73–82. ACM (2008)
Li, X., Han, W., Liu, G., Hong A., et al.: A speculative HMMER search implementation on GPU. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE Computer Society (2012)
Sun, T., Hong, A., et al.: CRQ-based fair scheduling on composable multicore architectures. In: 26th ACM SIGARCH International Conference on Supercomputing (ICS), Venice, Italy, pp. 173–184 (2012)
Liu, G., Han, W., Hong, A., et al.: FlexBFS: A parallelism-aware implementation of breadth-first search on GPU. In: 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, USA (2012)
Liu, G., Hong, A., et al.: A program behavior study of block cryptography algorithms on GPGPU. In: 2009 International Conference on Frontier of Computer Science and Technology (FCST), pp. 33–39 (2009)
Yao, P., Hong, A., Wang, Y., et al.: CuHMMer: a load-balanced cpu-gpu cooperative bioinformatics application. In: The 2010 International Conference on High Performance Computing and Simulation (HPCS 2010). IEEE Computer Society Press, Caen, France (2010)
Acknowledgement
This work is supported financially by the National Natural Science Foundation of China grants 61202044, the National Basic Research Program of China under contract 2011CB302501, the National Hi-tech Research and Development Program of China under contracts 2012AA010902 and 2012AA010303, the Research Fund of Southwest University of Science and Technology 10zx7119.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., Zhou, Y. (2015). Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-26555-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26554-4
Online ISBN: 978-3-319-26555-1
eBook Packages: Computer ScienceComputer Science (R0)