Nothing Special   »   [go: up one dir, main page]

Skip to main content

Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

Abstract

Graphic Processing Unit (GPU) can achieve remarkable performance for dataset-oriented application such as Back Propagation Network (BPN) under reasonable task decomposition and memory optimization. However, advantages of GPU’s memory architecture are still not fully exploited to parallel BPN. In this paper, we develop and analyze a parallel implementation of a back propagation neural network using CUDA. It focuses on kernels optimization through the use of shared memory and suitable blocks dimensions. The implementation was tested with seven well-known benchmark data sets and the results show promising 33.8x to 64.3x speedups can be realized compared to a sequential implementation on a CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Suresh, S., Omkar, N., Mani, V.: Parallel implementation of back-propagation algorithm in networks of workstations. IEEE Trans. Parallel Distrib. Syst. (TPDS) 16(1), 24–34 (2005)

    Article  Google Scholar 

  2. Che, S., et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. (JPDC) 68(10), 1370–1380 (2008)

    Article  Google Scholar 

  3. Jang, H., Park, A., Jung, K.: Neural network implementation using CUDA and OpenMP. In: Digital Image Computing: Techniques and Applications (DICTA 2008), pp. 155–161 (2008)

    Google Scholar 

  4. Lopes, N., Ribeiro, B.: GPU implementation of the multiple back-propagation algorithm. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 449–456. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp. 1–8 (2008)

    Google Scholar 

  6. Furedi, L., Szolgay, P.: CNN model on stream processing platform. In: European Conference on Circuit Theory and Design (ECCTD 2009), pp. 843–846 (2009)

    Google Scholar 

  7. NVIDIA CUDA Programming Guide 2.3. NVIDIA Corporation (2009)

    Google Scholar 

  8. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Readings Cogn. Sci. 1, 399–421 (1985)

    Google Scholar 

  9. Kavinguy, B.: A Neural Network on GPU. http://www.codeproject.com/KB/graphics/GPUNN.aspx

  10. Lopes, N., Ribeiro, B.: An evaluation of multiple feed-forward networks on GPUs. Int. J. Neural Syst. 21(1), 31–47 (2011)

    Article  Google Scholar 

  11. Xavier, S.-C., Madera-Ramirez, F., Uc-Cetina, V.: Parallel training of a back-propagation neural network using CUDA. In: Ninth IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 307–312 (2010)

    Google Scholar 

  12. Scanzio, S., Cumani, S., Gemello, R., et al.: Parallel implementation of artificial neural network training for speech recognition. Pattern Recogn. Lett. 31(11), 1302–1309 (2010)

    Article  Google Scholar 

  13. Ryoo, S., et al.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: IEEE Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 73–82. ACM (2008)

    Google Scholar 

  14. Li, X., Han, W., Liu, G., Hong A., et al.: A speculative HMMER search implementation on GPU. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE Computer Society (2012)

    Google Scholar 

  15. Sun, T., Hong, A., et al.: CRQ-based fair scheduling on composable multicore architectures. In: 26th ACM SIGARCH International Conference on Supercomputing (ICS), Venice, Italy, pp. 173–184 (2012)

    Google Scholar 

  16. Liu, G., Han, W., Hong, A., et al.: FlexBFS: A parallelism-aware implementation of breadth-first search on GPU. In: 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, USA (2012)

    Google Scholar 

  17. Liu, G., Hong, A., et al.: A program behavior study of block cryptography algorithms on GPGPU. In: 2009 International Conference on Frontier of Computer Science and Technology (FCST), pp. 33–39 (2009)

    Google Scholar 

  18. Yao, P., Hong, A., Wang, Y., et al.: CuHMMer: a load-balanced cpu-gpu cooperative bioinformatics application. In: The 2010 International Conference on High Performance Computing and Simulation (HPCS 2010). IEEE Computer Society Press, Caen, France (2010)

    Google Scholar 

Download references

Acknowledgement

This work is supported financially by the National Natural Science Foundation of China grants 61202044, the National Basic Research Program of China under contract 2011CB302501, the National Hi-tech Research and Development Program of China under contracts 2012AA010902 and 2012AA010303, the Research Fund of Southwest University of Science and Technology 10zx7119.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaobin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., Zhou, Y. (2015). Optimization and Analysis of Parallel Back Propagation Neural Network on GPU Using CUDA. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26555-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26554-4

  • Online ISBN: 978-3-319-26555-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics