Nothing Special   »   [go: up one dir, main page]

Skip to main content

Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM

  • Conference paper
  • First Online:
OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence (OpenSHMEM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10679))

Included in the following conference series:

Abstract

NVSHMEM is an implementation of OpenSHMEM for NVIDIA GPUs which allows communication to be issued from inside CUDA kernels. In this work, we present an implementation of Breadth First Search for multi-GPU systems using NVSHMEM. We analyze the benefits and bottlenecks of moving fine-grained communication into CUDA kernels. Using our implementation of BFS, we achieve up to 75% improvement in performance compared to a CUDA-aware MPI-based implementation, in the best case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. http://graph500.org: Graph 500 benchmark specification 1.2 (2017). http://www.graph500.org/

  2. Merrill, D., Garland, M., Grimshaw, A.: Scalable GPU graph traversal. SIGPLAN Not. 47, 117–128 (2012)

    Article  Google Scholar 

  3. Bisson, M., Bernaschi, M., Mastrostefano, E.: Parallel distributed breadth first search on the Kepler architecture. CoRR abs/1408.1605 (2014)

    Google Scholar 

  4. Potluri, S., Rossetti, D., Becker, D., Poole, D., Gorentla Venkata, M., Hernandez, O., Shamis, P., Lopez, M.G., Baker, M., Poole, W.: Exploring openSHMEM model to program GPU-based extreme-scale systems. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M.G. (eds.) OpenSHMEM 2014. LNCS, vol. 9397, pp. 18–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26428-8_2

    Chapter  Google Scholar 

  5. NVIDIA: GPUDirect (2015). https://developer.nvidia.com/gpudirect

  6. NVIDIA: GPUDirect RDMA (2015). http://docs.nvidia.com/cuda/gpudirect-rdma

  7. Rossetti, D.: GPUDirect: integrating the GPU with a network interface. In: GPU Technology Conference (2015)

    Google Scholar 

  8. Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: optimized GPU to GPU communication for infiniband clusters. Comput. Sci. 26, 257–266 (2011)

    Google Scholar 

  9. Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using GPUDirect RDMA for infiniband clusters with NVIDIA GPUs. In: Proceedings of the 2013 42nd International Conference on Parallel Processing, ICPP 2013, Washington, DC, USA, pp. 80–89. IEEE Computer Society (2013)

    Google Scholar 

  10. MVAPICH: MPI over infiniband, 10GigE/iWARP and RoCE (2015). http://mvapich.cse.ohio-state.edu

  11. Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W.C., Bisset, K.R., Thakur, R.: MPI-ACC: an integrated and extensible approach to data movement in accelerator-based systems. In: 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK (2012)

    Google Scholar 

  12. Potluri, S., Bureddy, D., Wang, H., Subramoni, H., Panda, D.K.: Extending openSHMEM for GPU computing. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS 2013, Washington, DC, USA, pp. 1001–1012. IEEE Computer Society (2013)

    Google Scholar 

  13. Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop, X10 2011, pp. 8:1–8:10. ACM, New York (2011)

    Google Scholar 

  14. Miyoshi, T., Irie, H., Shima, K., Honda, H., Kondo, M., Yoshinaga, T.: Flat: a GPU programming framework to provide embedded MPI. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 20–29. ACM, New York (2012)

    Google Scholar 

  15. Ueno, K., Suzumura, T.: Parallel distributed breadth first search on GPU. In: 20th Annual International Conference on High Performance Computing, HiPC 2013, Bengaluru (Bangalore), Karnataka, India, 18–21 December 2013, pp. 314–323 (2013)

    Google Scholar 

  16. Matsuoka, S.: Making TSUBAME2.0, the world’s greenest production supercomputer, even greener: challenges to the architects. In: Proceedings of the 2011 International Symposium on Low Power Electronics and Design, Fukuoka, Japan, 1–3 August 2011, pp. 367–368 (2011)

    Google Scholar 

  17. Bisson, M., Bernaschi, M., Mastrostefano, E.: Parallel distributed breadth first search on the Kepler architecture. IEEE Trans. Parallel Distrib. Syst. 27, 2091–2102 (2016)

    Article  Google Scholar 

  18. Pan, Y., Wang, Y., Wu, Y., Yang, C., Owens, J.D.: Multi-GPU graph analytics. CoRR abs/1504.04804 (2015)

    Google Scholar 

Download references

Acknowledgments

This research is supported in part by Oak Ridge National Lab, subcontract #4000145249. We would like to thank M. Bisson et al., authors of the multi-GPU implementation of BFS we have used as the baseline in this paper [17]. They have shared their code and have supported this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sreeram Potluri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Potluri, S., Goswami, A., Venkata, M.G., Imam, N. (2018). Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds) OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence. OpenSHMEM 2017. Lecture Notes in Computer Science(), vol 10679. Springer, Cham. https://doi.org/10.1007/978-3-319-73814-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73814-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73813-0

  • Online ISBN: 978-3-319-73814-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics