Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/SC.2012.70guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing

Published: 10 November 2012 Publication History

Abstract

Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "GigaTEPS" (billions of traversed edges per second), and the Graph500 benchmark has been established for this purpose. Graph traversal on single nodes has been well studied and optimized on modern CPU architectures. However, current cluster implementations suffer from high latency data communication with large volumes of transfers across nodes, leading to inefficiency in performance and energy consumption. In this work, we show that we can overcome these constraints using a combination of efficient low-overhead data compression techniques to reduce transfer volumes along with latency-hiding techniques. Using an optimized single node graph traversal algorithm [1], our novel cluster optimizations result in over 6.6X performance improvements over state-of-the-art data transfer techniques, and almost an order of magnitude in energy savings. Our resulting implementation of the Graph500 benchmark achieves 115 GigaTEPS on a 320-node/5120 core Intel® Endeavor cluster with Intel® Xeon® processors E5-2670, which matches the second ranked result in the recent November 2011 Graph500 list [2] with about 5.6X fewer nodes12. Our cluster optimizations only have a 1.8X overhead in overall performance from the performance of the optimized single-node implementation, and allows for near-linear scaling with number of nodes. Our algorithm on 1024 nodes on Intel® Xeon® processor X5670-based systems (with lower per-node performance) for a large multi-Terabyte graph attained 195 GigaTEPS in performance, proving the high scalability of our algorithm. Our per-node performance is the highest in the top 10 of the Nov 2011 Graph500 list.

Cited By

View all
  • (2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
  • (2016)Power-efficient breadth-first search with DRAM row buffer locality-aware address mappingProceedings of the First International Workshop on High Performance Graph Data Management and Processing10.5555/3018830.3018833(17-24)Online publication date: 13-Nov-2016
  • (2015)Energy-efficient Algorithms for Ultrascale SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502052:2(77-104)Online publication date: 6-Apr-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2012
1139 pages
ISBN:9781467308069

Publisher

IEEE Computer Society

United States

Publication History

Published: 10 November 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)iSpan: Parallel Identification of Strongly Connected Components with Spanning TreesACM Transactions on Parallel Computing10.1145/35435429:3(1-27)Online publication date: 18-Aug-2022
  • (2016)Power-efficient breadth-first search with DRAM row buffer locality-aware address mappingProceedings of the First International Workshop on High Performance Graph Data Management and Processing10.5555/3018830.3018833(17-24)Online publication date: 13-Nov-2016
  • (2015)Energy-efficient Algorithms for Ultrascale SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502052:2(77-104)Online publication date: 6-Apr-2015
  • (2015)Enhanced GPU-based distributed breadth first searchProceedings of the 12th ACM International Conference on Computing Frontiers10.1145/2742854.2742887(1-8)Online publication date: 6-May-2015
  • (2015)High-Performance and Scalable GPU Graph TraversalACM Transactions on Parallel Computing10.1145/27175111:2(1-30)Online publication date: 18-Feb-2015
  • (2014)The more the merrierProceedings of the VLDB Endowment10.14778/2735496.27355078:4(449-460)Online publication date: 1-Dec-2014
  • (2014)Navigating the maze of graph analytics frameworks using massive graph datasetsProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610518(979-990)Online publication date: 18-Jun-2014
  • (2014)Application centric energy-efficiency study of distributed multi-core and hybrid CPU-GPU systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.72(819-829)Online publication date: 16-Nov-2014
  • (2013)Efficient Hybrid Breadth-First Search on GPUsProceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing - Volume 828610.1007/978-3-319-03889-6_5(40-50)Online publication date: 18-Dec-2013

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media