research-article

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

Authors:

Karen D. Devine,

Sivasankaran RajamanickamAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 50, Pages 1 - 12

https://doi.org/10.1145/2503210.2503293

Published: 17 November 2013 Publication History

Abstract

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two-dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores.

References

[1]

Epetra home page. http://trilinos.sandia.gov/packages/epetra.

[2]

The Graph500 list. http://www.graph500.org/.

[3]

A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 16--575. IEEE Press, 2006.

Digital Library

[4]

C. G. Baker, U. L. Hetmaniuk, R. B. Lehoucq, and H. K. Thornquist. Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Softw., 36(3):13:1--13:23, 2009.

Digital Library

[5]

R. H. Bisseling. Parallel Scientific Computing: A structured approach using BSP and MPI. Oxford University Press, 2004.

Digital Library

[6]

L. S. Blackford, J. Choi, A. Cleary, E. D'Azeuedo, J. Demmel, I. Dhillon, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK user's guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.

Digital Library

[7]

E. Boman, Ü. Çatalyürek, C. Chevalier, and K. Devine. The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring. Scientific Programming, 20(2):129--150, 2012.

Digital Library

[8]

E. G. Boman. A nested dissection approach to sparse matrix partitioning. Proc. Applied Math. and Mechanics, 7(1):1010803--1010804, 2007. Presented at ICIAM'07, Zürich, Switzerland, July 2007.

[9]

E. G. Boman and M. M. Wolf. A nested-dissection partitioning method for parallel sparse matrix-vector multiplication. In Proc. of the IEEE High-Performance Extreme Computing Conf. (HPEC), 2013. To appear.

[10]

J. Bradley, D. de Jager, W. Knottenbelt, and A. Trifunovic. Hypergraph partitioning for faster parallel pagerank computation. In Lecture Notes in Computer Science, volume 3670, pages 155--171, 2005.

Digital Library

[11]

Ü. Çatalyürek and C. Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Dist. Systems, 10(7):673--693, 1999.

Digital Library

[12]

Ü. Çatalyürek and C. Aykanat. A fine-grain hypergraph model for 2d decomposition of sparse matrices. In Proc. IPDPS 8th Int'l Workshop on Solving Irregularly Structured Problems in Parallel (Irregular 2001), April 2001.

Digital Library

[13]

Ü. Çatalyürek and C. Aykanat. A hypergraph-partitioning approach for coarse-grain decomposition. In Proc. Supercomputing 2001. ACM, 2001.

Digital Library

[14]

U. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656--683, Feb. 2010.

Digital Library

[15]

A. Cevahir, C. Aykanat, A. Turk, and B. Cambazoglu. Site-based partitioning and repartitioning techniques for parallel pagerank computation. IEEE Trans. on Parallel and Distributed Systems, 22(5):786--802, 2011.

Digital Library

[16]

D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SIAM Data Mining, 2004.

[17]

T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.

Digital Library

[18]

B. Hendrickson, R. Leland, and S. Plimpton. An efficient parallel algorithm for matrix-vector multiplication. International Journal of High Speed Computing, 7:73--88, 1995.

[19]

M. Heroux and D. Rouson. Special issue on Trilinos. Scientific Programming, 20(2--3), 2012.

[20]

M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397--423, 2005.

Digital Library

[21]

G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: application in VLSI domain. In Proc. 34th Design Automation Conf., pages 526--529. ACM, 1997.

Digital Library

[22]

G. Karypis and V. Kumar. Parmetis: Parallel graph partitioning and sparse matrix ordering library. Technical Report 97--060, Dept. Computer Science, University of Minnesota, 1997. http://www.cs.umn.edu/~metis.

[23]

S. Kirkland and D. Paul. Bipartite subgraphs and the signless Laplacian matrix. Appl. Anal. Discrete Math., 5:1--13, 2011.

[24]

J. Leskovec. SNAP (Stanford Network Analysis Platform) network data sets. http://snap.stanford.edu/data/index.html.

[25]

D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computation. Commun. ACM, 28(8):840--853, Aug. 1985.

Digital Library

[26]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.

[27]

F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In H. Liddell, A. Colbrook, B. Hertzberger, and P. Sloot, editors, High-Performance Computing and Networking, volume 1067 of Lecture Notes in Computer Science, pages 493--498. Springer Berlin Heidelberg, 1996.

Digital Library

[28]

S. Rajamanickam and E. G. Boman. Parallel partitioning with Zoltan: Is hypergraph partitioning worth it? In D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, editors, Graph Partitioning and Graph Clustering, volume 588 of AMS Contemporary Mathematics, pages 37--52. AMS, 2013.

[29]

S. Salihoglu and J. Widom. GPS: A graph processing system. Technical report, Stanford University, 2012.

[30]

K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint partitioning. Concurrency and Computation: Practice and Experience, 14:219--240, 2002.

[31]

C. Seshadhri, T. G. Kolda, and A. Pinar. Community structure and scale-free collections of Erdös-Rényi graphs. Phys. Rev. E, 85:056109, May 2012.

[32]

G. W. Stewart. A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl., 23:601--614, 2000.

Digital Library

[33]

B. Vastenhouw and R. H. Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Review, 47(1):67--95, 2005.

Digital Library

[34]

A. Yoo, A. H. Baker, R. Pearce, and V. E. Henson. A scalable eigensolver for large scale-free graphs using 2d graph partitioning. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 63:1--63:11, New York, NY, USA, 2011. ACM.

Digital Library

[35]

A. Yoo and K. Henderson. Parallel massive scale-free graph generators, 2010.

Cited By

Hwang SBaek DPark JHuh J(2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3653020
Shao YLi HGu XYin HLi YMiao XZhang WCui BChen L(2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3648358
Gianinazzi LZiogas AHuang LLuczynski PAshkboosh SScheidl FCarigiet AGe CAbubaker NBesta MBen-Nun THoefler TLee IChabbi MSteuwer M(2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638496
Show More Cited By

Recommendations

Partitioning of a graph into induced subgraphs not containing prescribed cliques
Abstract
Let K p be a complete graph of order p ≥ 2. A K p-free k-coloring of a graph H is a partition of V ( H ) into V 1 , V 2 … , V k such that H [ V i ] does not contain K p for each i ≤ k. In 1977 Borodin and Kostochka conjectured that any graph H ...
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and Computation
Abstract
In the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...
On the Optimality of the Median Cut Spectral Bisection Graph Partitioning Method

Recursive spectral bisection (RSB) is a heuristic technique for finding a minimum cut graph bisection. To use this method the second eigenvector of the Laplacian of the graph is computed and from it a bisection is obtained. The most common method is to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
611
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)8

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hwang SBaek DPark JHuh J(2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3653020
Shao YLi HGu XYin HLi YMiao XZhang WCui BChen L(2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3648358
Gianinazzi LZiogas AHuang LLuczynski PAshkboosh SScheidl FCarigiet AGe CAbubaker NBesta MBen-Nun THoefler TLee IChabbi MSteuwer M(2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638496
Lee HDathathri RPingali KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640421
Chen YYu J(2024)Accelerating SpMV for Scale-Free Graphs with Optimized Bins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00190(2407-2420)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00190
Li YYang TYang MShen ZLi B(2024)Celeritas: Out-of-Core Based Unsupervised Graph Neural Network via Cross-Layer Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00018(91-107)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00018
Dapena DLau DArce G(2023)Parallel Graph Signal Processing: Sampling and ReconstructionIEEE Transactions on Signal and Information Processing over Networks10.1109/TSIPN.2023.32615049(190-206)Online publication date: 2023
https://doi.org/10.1109/TSIPN.2023.3261504
Zhang SJiang ZHou XLi MYuan MYou H(2023)DRONE: An Efficient Distributed Subgraph-Centric Framework for Processing Large-Scale Power-law GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322306834:2(463-474)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TPDS.2022.3223068
Lin HYan MYe XFan DPan SChen WXie Y(2023)A Comprehensive Survey on Distributed Training of Graph Neural NetworksProceedings of the IEEE10.1109/JPROC.2023.3337442111:12(1572-1606)Online publication date: Dec-2023
https://doi.org/10.1109/JPROC.2023.3337442
Chen ZDeng QWang Y(2023)Fast Approximate LUT-based Vector Multiplication in DRAM2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00125(832-839)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00125
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents