Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503293acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Scalable matrix computations on large scale-free graphs using 2D graph partitioning

Published: 17 November 2013 Publication History

Abstract

Scalable parallel computing is essential for processing large scale-free (power-law) graphs. The distribution of data across processes becomes important on distributed-memory computers with thousands of cores. It has been shown that two-dimensional layouts (edge partitioning) can have significant advantages over traditional one-dimensional layouts. However, simple 2D block distribution does not use the structure of the graph, and more advanced 2D partitioning methods are too expensive for large graphs. We propose a new two-dimensional partitioning algorithm that combines graph partitioning with 2D block distribution. The computational cost of the algorithm is essentially the same as 1D graph partitioning. We study the performance of sparse matrix-vector multiplication (SpMV) for scale-free graphs from the web and social networks using several different partitioners and both 1D and 2D data layouts. We show that SpMV run time is reduced by exploiting the graph's structure. Contrary to popular belief, we observe that current graph and hypergraph partitioners often yield relatively good partitions on scale-free graphs. We demonstrate that our new 2D partitioning method consistently outperforms the other methods considered, for both SpMV and an eigensolver, on matrices with up to 1.6 billion nonzeros using up to 16,384 cores.

References

[1]
Epetra home page. http://trilinos.sandia.gov/packages/epetra.
[2]
The Graph500 list. http://www.graph500.org/.
[3]
A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 16--575. IEEE Press, 2006.
[4]
C. G. Baker, U. L. Hetmaniuk, R. B. Lehoucq, and H. K. Thornquist. Anasazi software for the numerical solution of large-scale eigenvalue problems. ACM Trans. Math. Softw., 36(3):13:1--13:23, 2009.
[5]
R. H. Bisseling. Parallel Scientific Computing: A structured approach using BSP and MPI. Oxford University Press, 2004.
[6]
L. S. Blackford, J. Choi, A. Cleary, E. D'Azeuedo, J. Demmel, I. Dhillon, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK user's guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.
[7]
E. Boman, Ü. Çatalyürek, C. Chevalier, and K. Devine. The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring. Scientific Programming, 20(2):129--150, 2012.
[8]
E. G. Boman. A nested dissection approach to sparse matrix partitioning. Proc. Applied Math. and Mechanics, 7(1):1010803--1010804, 2007. Presented at ICIAM'07, Zürich, Switzerland, July 2007.
[9]
E. G. Boman and M. M. Wolf. A nested-dissection partitioning method for parallel sparse matrix-vector multiplication. In Proc. of the IEEE High-Performance Extreme Computing Conf. (HPEC), 2013. To appear.
[10]
J. Bradley, D. de Jager, W. Knottenbelt, and A. Trifunovic. Hypergraph partitioning for faster parallel pagerank computation. In Lecture Notes in Computer Science, volume 3670, pages 155--171, 2005.
[11]
Ü. Çatalyürek and C. Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Dist. Systems, 10(7):673--693, 1999.
[12]
Ü. Çatalyürek and C. Aykanat. A fine-grain hypergraph model for 2d decomposition of sparse matrices. In Proc. IPDPS 8th Int'l Workshop on Solving Irregularly Structured Problems in Parallel (Irregular 2001), April 2001.
[13]
Ü. Çatalyürek and C. Aykanat. A hypergraph-partitioning approach for coarse-grain decomposition. In Proc. Supercomputing 2001. ACM, 2001.
[14]
U. V. Çatalyürek, C. Aykanat, and B. Uçar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656--683, Feb. 2010.
[15]
A. Cevahir, C. Aykanat, A. Turk, and B. Cambazoglu. Site-based partitioning and repartitioning techniques for parallel pagerank computation. IEEE Trans. on Parallel and Distributed Systems, 22(5):786--802, 2011.
[16]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SIAM Data Mining, 2004.
[17]
T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.
[18]
B. Hendrickson, R. Leland, and S. Plimpton. An efficient parallel algorithm for matrix-vector multiplication. International Journal of High Speed Computing, 7:73--88, 1995.
[19]
M. Heroux and D. Rouson. Special issue on Trilinos. Scientific Programming, 20(2--3), 2012.
[20]
M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397--423, 2005.
[21]
G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: application in VLSI domain. In Proc. 34th Design Automation Conf., pages 526--529. ACM, 1997.
[22]
G. Karypis and V. Kumar. Parmetis: Parallel graph partitioning and sparse matrix ordering library. Technical Report 97--060, Dept. Computer Science, University of Minnesota, 1997. http://www.cs.umn.edu/~metis.
[23]
S. Kirkland and D. Paul. Bipartite subgraphs and the signless Laplacian matrix. Appl. Anal. Discrete Math., 5:1--13, 2011.
[24]
J. Leskovec. SNAP (Stanford Network Analysis Platform) network data sets. http://snap.stanford.edu/data/index.html.
[25]
D. P. O'Leary and G. W. Stewart. Data-flow algorithms for parallel matrix computation. Commun. ACM, 28(8):840--853, Aug. 1985.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[27]
F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In H. Liddell, A. Colbrook, B. Hertzberger, and P. Sloot, editors, High-Performance Computing and Networking, volume 1067 of Lecture Notes in Computer Science, pages 493--498. Springer Berlin Heidelberg, 1996.
[28]
S. Rajamanickam and E. G. Boman. Parallel partitioning with Zoltan: Is hypergraph partitioning worth it? In D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, editors, Graph Partitioning and Graph Clustering, volume 588 of AMS Contemporary Mathematics, pages 37--52. AMS, 2013.
[29]
S. Salihoglu and J. Widom. GPS: A graph processing system. Technical report, Stanford University, 2012.
[30]
K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint partitioning. Concurrency and Computation: Practice and Experience, 14:219--240, 2002.
[31]
C. Seshadhri, T. G. Kolda, and A. Pinar. Community structure and scale-free collections of Erdös-Rényi graphs. Phys. Rev. E, 85:056109, May 2012.
[32]
G. W. Stewart. A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl., 23:601--614, 2000.
[33]
B. Vastenhouw and R. H. Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Review, 47(1):67--95, 2005.
[34]
A. Yoo, A. H. Baker, R. Pearce, and V. E. Henson. A scalable eigensolver for large scale-free graphs using 2d graph partitioning. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 63:1--63:11, New York, NY, USA, 2011. ACM.
[35]
A. Yoo and K. Henderson. Parallel massive scale-free graph generators, 2010.

Cited By

View all
  • (2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
  • (2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
  • (2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph partitioning
  2. parallel computing
  3. scale-free graphs
  4. sparse matrix-vector multiplication
  5. two-dimensional distribution

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)8
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
  • (2024)Distributed Graph Neural Network Training: A SurveyACM Computing Surveys10.1145/364835856:8(1-39)Online publication date: 10-Apr-2024
  • (2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
  • (2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
  • (2024)Accelerating SpMV for Scale-Free Graphs with Optimized Bins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00190(2407-2420)Online publication date: 13-May-2024
  • (2024)Celeritas: Out-of-Core Based Unsupervised Graph Neural Network via Cross-Layer Computing2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00018(91-107)Online publication date: 2-Mar-2024
  • (2023)Parallel Graph Signal Processing: Sampling and ReconstructionIEEE Transactions on Signal and Information Processing over Networks10.1109/TSIPN.2023.32615049(190-206)Online publication date: 2023
  • (2023)DRONE: An Efficient Distributed Subgraph-Centric Framework for Processing Large-Scale Power-law GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322306834:2(463-474)Online publication date: 1-Feb-2023
  • (2023)A Comprehensive Survey on Distributed Training of Graph Neural NetworksProceedings of the IEEE10.1109/JPROC.2023.3337442111:12(1572-1606)Online publication date: Dec-2023
  • (2023)Fast Approximate LUT-based Vector Multiplication in DRAM2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00125(832-839)Online publication date: 17-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media