Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2621934.2621936acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial
Open access

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

Published: 22 June 2014 Publication History

Abstract

High performance graph analytics are critical for a long list of application domains. In recent years, the rapid advancement of many-core processors, in particular graphical processing units (GPUs), has sparked a broad interest in developing high performance parallel graph programs on these architectures. However, the SIMT architecture used in GPUs places particular constraints on both the design and implementation of the algorithms and data structures, making the development of such programs difficult and time-consuming.
We present MapGraph, a high performance parallel graph programming framework that delivers up to 3 billion Traversed Edges Per Second (TEPS) on a GPU. MapGraph provides a high-level abstraction that makes it easy to write graph programs and obtain good parallel speedups on GPUs. To deliver high performance, MapGraph dynamically chooses among different scheduling strategies depending on the size of the frontier and the size of the adjacency lists for the vertices in the frontier. In addition, a Structure Of Arrays (SOA) pattern is used to ensure coalesced memory access. Our experiments show that, for many graph analytics algorithms, an implementation, with our abstraction, is up to two orders of magnitude faster than a parallel CPU implementation and is comparable to state-of-the-art, manually optimized GPU implementations. In addition, with our abstraction, new graph analytics can be developed with relatively little effort.

References

[1]
E. S.-N. Abdullah Gharaibeh, Lauro Beltrao Costa and M. Ripeanu. Totem: Accelerating graph processing on hybrid cpu+gpu systems. GPU Technology Conference, 2013.
[2]
S. Baxter. Modern gpu library. 2013. http://www.moderngpu.com/.
[3]
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.
[4]
G. Chapuis, H. Djidjev, R. Andonov, S. Thulasidasan, and D. Lavenier. Efficient multi-gpu algorithm for all-pairs shortest paths. In IPDPS 2014, May 2014.
[5]
N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen. Parallel pagerank computation using gpus. In Proceedings of the Third Symposium on Information and Communication Technology, SoICT '12, pages 223--230. ACM, 2012.
[6]
E. Elsen and V. Vaidyanathan. Vertexapi2 - a vertex-program api for large graph computations on the gpu. 2014. http://www.royal-caliber.com/vertexapi2.pdf.
[7]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 17--30, Berkeley, CA, USA, 2012. USENIX Association.
[8]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, New York, NY, USA, 2012. ACM.
[9]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.
[10]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[11]
D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM.
[12]
NVIDIA. Cuda programming guide. http://www.nvidia.com/object/cuda.html.
[13]
J. Soman, K. Kothapalli, and P. J. Narayanan. Some gpu algorithms for graph connected components and spanning tree. Parallel Processing Letters, 20(04):325--339, 2010.
[14]
G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.
[15]
J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Transactions on Parallel and Distributed Systems, 99:1, 2013.

Cited By

View all
  • (2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
  • (2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
  • (2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GRADES'14: Proceedings of Workshop on GRAph Data management Experiences and Systems
June 2014
79 pages
ISBN:9781450329828
DOI:10.1145/2621934
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2014

Check for updates

Author Tags

  1. GPU
  2. Graph analytics
  3. high-level API

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)161
  • Downloads (Last 6 weeks)18
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
  • (2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
  • (2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
  • (2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
  • (2024)GPU-based butterfly countingThe VLDB Journal10.1007/s00778-024-00861-033:5(1543-1567)Online publication date: 27-Jun-2024
  • (2023)GGPAProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592198(33-41)Online publication date: 9-May-2023
  • (2023)Accelerating k-Core Decomposition by a GPU2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00142(1818-1831)Online publication date: Apr-2023
  • (2023)Accelerating matrix-centric graph processing on GPUs through bit-level optimizationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.013177(53-67)Online publication date: Jul-2023
  • (2022)ByteGNNProceedings of the VLDB Endowment10.14778/3514061.351406915:6(1228-1242)Online publication date: 1-Feb-2022
  • (2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media