Article

Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing

Authors:

Nadathur Satish,

Changkyu Kim,

Jatin Chhugani,

Pradeep DubeyAuthors Info & Claims

SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1 - 11

https://doi.org/10.1109/SC.2012.70

Published: 10 November 2012 Publication History

Abstract

Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "GigaTEPS" (billions of traversed edges per second), and the Graph500 benchmark has been established for this purpose. Graph traversal on single nodes has been well studied and optimized on modern CPU architectures. However, current cluster implementations suffer from high latency data communication with large volumes of transfers across nodes, leading to inefficiency in performance and energy consumption. In this work, we show that we can overcome these constraints using a combination of efficient low-overhead data compression techniques to reduce transfer volumes along with latency-hiding techniques. Using an optimized single node graph traversal algorithm [1], our novel cluster optimizations result in over 6.6X performance improvements over state-of-the-art data transfer techniques, and almost an order of magnitude in energy savings. Our resulting implementation of the Graph500 benchmark achieves 115 GigaTEPS on a 320-node/5120 core Intel® Endeavor cluster with Intel® Xeon® processors E5-2670, which matches the second ranked result in the recent November 2011 Graph500 list [2] with about 5.6X fewer nodes12. Our cluster optimizations only have a 1.8X overhead in overall performance from the performance of the optimized single-node implementation, and allows for near-linear scaling with number of nodes. Our algorithm on 1024 nodes on Intel® Xeon® processor X5670-based systems (with lower per-node performance) for a large multi-Terabyte graph attained 195 GigaTEPS in performance, proving the high scalability of our algorithm. Our per-node performance is the highest in the top 10 of the Nov 2011 Graph500 list.

Cited By

View all

Ji YLiu HHu YHuang H(2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3543542
Imamura SYasui YInoue KOno TSasaki HFujisawa KSuzumura TGarcia-Gasulla DDayarathna MShun J(2016)Power-efficient breadth-first search with DRAM row buffer locality-aware address mappingProceedings of the First International Workshop on High Performance Graph Data Management and Processing10.5555/3018830.3018833(17-24)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3018830.3018833
Carretero JDistefano SPetcu DPop DRauber TRunger GSingh D(2015)Energy-efficient Algorithms for Ultrascale SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502052:2(77-104)Online publication date: 6-Apr-2015
https://dl.acm.org/doi/10.14529/jsfi150205
Show More Cited By

Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing

Recommendations

Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "...
Using the Intel Many Integrated Core to accelerate graph traversal

Data-intensive applications have drawn more and more attention in the last few years. The basic graph traversal algorithm, the breadth-first search (BFS), a typical data-intensive application, is widely used and the Graph 500 benchmark uses it to rank ...
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

In heterogeneous systems that include CPUs and GPUs, the data transfers between these components play a critical role in determining the performance of applications. Software pipelining is a common approach to mitigate the overheads of those transfers. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

November 2012

1139 pages

ISBN:9781467308069

Publisher

IEEE Computer Society

United States

Publication History

Published: 10 November 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ji YLiu HHu YHuang H(2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
https://dl.acm.org/doi/10.1145/3543542
Imamura SYasui YInoue KOno TSasaki HFujisawa KSuzumura TGarcia-Gasulla DDayarathna MShun J(2016)Power-efficient breadth-first search with DRAM row buffer locality-aware address mappingProceedings of the First International Workshop on High Performance Graph Data Management and Processing10.5555/3018830.3018833(17-24)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3018830.3018833
Carretero JDistefano SPetcu DPop DRauber TRunger GSingh D(2015)Energy-efficient Algorithms for Ultrascale SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502052:2(77-104)Online publication date: 6-Apr-2015
https://dl.acm.org/doi/10.14529/jsfi150205
Bernaschi MCarbone GMastrostefano EBisson MFatica MNapoli CSalapura VFranke HHou R(2015)Enhanced GPU-based distributed breadth first searchProceedings of the 12th ACM International Conference on Computing Frontiers10.1145/2742854.2742887(1-8)Online publication date: 6-May-2015
https://dl.acm.org/doi/10.1145/2742854.2742887
Merrill DGarland MGrimshaw A(2015)High-Performance and Scalable GPU Graph TraversalACM Transactions on Parallel Computing10.1145/27175111:2(1-30)Online publication date: 18-Feb-2015
https://dl.acm.org/doi/10.1145/2717511
Then MKaufmann MChirigati FHoang-Vu TPham KKemper ANeumann TVo H(2014)The more the merrierProceedings of the VLDB Endowment10.14778/2735496.27355078:4(449-460)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.14778/2735496.2735507
Satish NSundaram NPatwary MSeo JPark JHassaan MSengupta SYin ZDubey PDyreson CLi FÖzsu M(2014)Navigating the maze of graph analytics frameworks using massive graph datasetsProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610518(979-990)Online publication date: 18-Jun-2014
https://dl.acm.org/doi/10.1145/2588555.2610518
Cumming BFourestey GFuhrer OGysi TFatica MSchulthess TDamkroger TDongarra J(2014)Application centric energy-efficiency study of distributed multi-core and hybrid CPU-GPU systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.72(819-829)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.72
Hiragushi TTakahashi D(2013)Efficient Hybrid Breadth-First Search on GPUsProceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing - Volume 828610.1007/978-3-319-03889-6_5(40-50)Online publication date: 18-Dec-2013
https://dl.acm.org/doi/10.1007/978-3-319-03889-6_5

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Recommendations