tutorial

Open access

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

Authors:

Michael Personick,

Bryan ThompsonAuthors Info & Claims

GRADES'14: Proceedings of Workshop on GRAph Data management Experiences and Systems

Pages 1 - 6

https://doi.org/10.1145/2621934.2621936

Published: 22 June 2014 Publication History

Abstract

High performance graph analytics are critical for a long list of application domains. In recent years, the rapid advancement of many-core processors, in particular graphical processing units (GPUs), has sparked a broad interest in developing high performance parallel graph programs on these architectures. However, the SIMT architecture used in GPUs places particular constraints on both the design and implementation of the algorithms and data structures, making the development of such programs difficult and time-consuming.

We present MapGraph, a high performance parallel graph programming framework that delivers up to 3 billion Traversed Edges Per Second (TEPS) on a GPU. MapGraph provides a high-level abstraction that makes it easy to write graph programs and obtain good parallel speedups on GPUs. To deliver high performance, MapGraph dynamically chooses among different scheduling strategies depending on the size of the frontier and the size of the adjacency lists for the vertices in the frontier. In addition, a Structure Of Arrays (SOA) pattern is used to ensure coalesced memory access. Our experiments show that, for many graph analytics algorithms, an implementation, with our abstraction, is up to two orders of magnitude faster than a parallel CPU implementation and is comparable to state-of-the-art, manually optimized GPU implementations. In addition, with our abstraction, new graph analytics can be developed with relatively little effort.

References

[1]

E. S.-N. Abdullah Gharaibeh, Lauro Beltrao Costa and M. Ripeanu. Totem: Accelerating graph processing on hybrid cpu+gpu systems. GPU Technology Conference, 2013.

[2]

S. Baxter. Modern gpu library. 2013. http://www.moderngpu.com/.

[3]

N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.

[4]

G. Chapuis, H. Djidjev, R. Andonov, S. Thulasidasan, and D. Lavenier. Efficient multi-gpu algorithm for all-pairs shortest paths. In IPDPS 2014, May 2014.

[5]

N. T. Duong, Q. A. P. Nguyen, A. T. Nguyen, and H.-D. Nguyen. Parallel pagerank computation using gpus. In Proceedings of the Third Symposium on Information and Communication Technology, SoICT '12, pages 223--230. ACM, 2012.

Digital Library

[6]

E. Elsen and V. Vaidyanathan. Vertexapi2 - a vertex-program api for large graph computations on the gpu. 2014. http://www.royal-caliber.com/vertexapi2.pdf.

[7]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 17--30, Berkeley, CA, USA, 2012. USENIX Association.

Digital Library

[8]

S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, New York, NY, USA, 2012. ACM.

Digital Library

[9]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.

Digital Library

[10]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.

Digital Library

[11]

D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM.

Digital Library

[12]

NVIDIA. Cuda programming guide. http://www.nvidia.com/object/cuda.html.

[13]

J. Soman, K. Kothapalli, and P. J. Narayanan. Some gpu algorithms for graph connected components and spanning tree. Parallel Processing Letters, 20(04):325--339, 2010.

[14]

G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.

[15]

J. Zhong and B. He. Medusa: Simplified graph processing on gpus. IEEE Transactions on Parallel and Distributed Systems, 99:1, 2013.

Cited By

Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Gan XWu GQiu SXiong FSi JFang JDong DGong CLi TWang ZLee IChabbi MSteuwer M(2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638498
Yuan LAhmad AYan DHan JAdhikari SYu XZhou Y(2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00245
Show More Cited By

Index Terms

MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs

Recommendations

Allok: a machine learning approach for efficient graph execution on CPU–GPU clusters
Abstract
The unprecedented increase in interconnected data has driven the development of efficient graph analytics for extensive data analysis, resulting in improvements across various domains. Prior work has focused on optimizing graph execution for both ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GRADES'14: Proceedings of Workshop on GRAph Data management Experiences and Systems

June 2014

79 pages

ISBN:9781450329828

DOI:10.1145/2621934

Program Chairs:
Peter Boncz
CWI
,
Josep Lluis Larriba Pey
UPC Catalunya

Copyright © 2014 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2014

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

SIGMOD/PODS'14

Sponsor:

SIGMOD

SIGMOD/PODS'14: International Conference on Management of Data

June 22 - 27, 2014

UT, Snowbird, USA

Acceptance Rates

Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

95
Total Citations
View Citations
1,406
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)17

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Gan XWu GQiu SXiong FSi JFang JDong DGong CLi TWang ZLee IChabbi MSteuwer M(2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638498
Yuan LAhmad AYan DHan JAdhikari SYu XZhou Y(2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00245
Behera NKumar ARajadurai T ENitish SM RNasre R(2024)StarPlat: A Versatile DSL for Graph AnalyticsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967(104967)Online publication date: Aug-2024
https://doi.org/10.1016/j.jpdc.2024.104967
Xia YZhang FXu QZhang MYao ZLu LDu XDeng DHe BMa S(2024)GPU-based butterfly countingThe VLDB Journal10.1007/s00778-024-00861-033:5(1543-1567)Online publication date: 27-Jun-2024
https://doi.org/10.1007/s00778-024-00861-0
Zhang BDu HChen SKang YBartolini ARietveld KSchuman CMoreira J(2023)GGPAProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592198(33-41)Online publication date: 9-May-2023
https://dl.acm.org/doi/10.1145/3587135.3592198
Ahmad AYuan LYan DGuo GChen JZhang C(2023)Accelerating k-Core Decomposition by a GPU2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00142(1818-1831)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00142
Chen JSung HShen XTallent NBarker KLi A(2023)Accelerating matrix-centric graph processing on GPUs through bit-level optimizationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.013177(53-67)Online publication date: Jul-2023
https://doi.org/10.1016/j.jpdc.2023.02.013
Zheng CChen HCheng YSong ZWu YLi CCheng JYang HZhang S(2022)ByteGNNProceedings of the VLDB Endowment10.14778/3514061.351406915:6(1228-1242)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.14778/3514061.3514069
Jeong SLee YLee JChoi HSong SLee JKim YKim HKloeckner AMoreira J(2022)Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph ProcessingProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569686(198-210)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569686
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents