Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3545008.3545048acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Parallel Algorithms for Masked Sparse Matrix-Matrix Products

Published: 13 January 2023 Publication History

Abstract

Computing the product of two sparse matrices (SpGEMM) is a fundamental operation in various combinatorial and graph algorithms as well as various bioinformatics and data analytics applications for computing inner-product similarities. For an important class of algorithms, only a subset of the output entries are needed, and the resulting operation is known as Masked SpGEMM since a subset of the output entries is considered to be “masked out”.
Existing algorithms for Masked SpGEMM usually do not consider mask as part of multiplication and either first compute a regular SpGEMM followed by masking, or perform a sparse inner product only for output elements that are not masked out. In this work, we investigate various novel algorithms and data structures for this rather challenging and important computation, and provide guidelines on how to design a fast Masked-SpGEMM for shared-memory architectures. Our evaluations show that factors such as matrix and mask density, mask structure and cache behavior play a vital role in attaining high performance for Masked SpGEMM. We evaluate our algorithms on a large number of real-world and synthetic matrices using several real-world benchmarks and show that our algorithms in most cases significantly outperform the state of the art for Masked SpGEMM implementations.

References

[1]
Ariful Azad, Mohsen Mahmoudi Aznaveh, Scott Beamer, Mark Blanco, Jinhao Chen, Luke D’Alessandro, Roshan Dathathri, Tim Davis, Kevin Deweese, Jesun Firoz, 2020. Evaluation of graph analytics frameworks using the GAP benchmark suite. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 216–227.
[2]
Ariful Azad, Grey Ballard, Aydın Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624–C651.
[3]
Ariful Azad, Aydin Buluç, and John Gilbert. 2015. Parallel triangle counting and enumeration using matrix algebra. In International Parallel and Distributed Processing Symposium Workshops. IEEE, 804–811.
[4]
Mohsen Aznaveh, Jinhao Chen, Timothy A Davis, Bálint Hegyi, Scott P Kolodziej, Timothy G Mattson, and Gábor Szárnyas. 2020. Parallel GraphBLAS with OpenMP. In Proceedings of the SIAM Workshop on Combinatorial Scientific Computing. SIAM, 138–148.
[5]
David A Bader, John Feo, John Gilbert, Jeremy Kepner, David Koester, Eugene Loh, Kamesh Madduri, Bill Mann, and Theresa Meuse. 2006. HPCS scalable synthetic compact applications 2: graph analysis. SSCA 2(2006), v2.
[6]
Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–10.
[7]
Nathan Bell, Steven Dalton, and Luke N. Olson. 2012. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods. SIAM Journal on Scientific Computing 34, 4 (2012), C123–C152.
[8]
Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To push or to pull: On reducing communication and synchronization in graph computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 93–104.
[9]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25, 2 (2001), 163–177.
[10]
W. Briggs, V. Henson, and S. McCormick. 2000. A Multigrid Tutorial, Second Edition(second ed.). Society for Industrial and Applied Mathematics.
[11]
Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1–11.
[12]
Aydin Buluç, Tim Mattson, Scott McMillan, José Moreira, and Carl Yang. 2017. Design of the GraphBLAS API for C. In International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 643–652.
[13]
Aydin Buluç and John R. Gilbert. 2012. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments. SIAM Journal on Scientific Computing 34, 4 (2012), C170–C191.
[14]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. 442–446.
[15]
Timothy A. Davis. 2018. Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–6.
[16]
Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Trans. Math. Softw. 45, 4, Article 44 (Dec. 2019), 25 pages.
[17]
Timothy A Davis. 2022. Algorithm 10xx: SuiteSparse:GraphBLAS: parallel graph algorithms in the language of sparse linear algebra. (2022). draft manuscript.
[18]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38, 1 (2011), 1–25.
[19]
Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2017. Performance-portable sparse matrix-matrix multiplication for many-core architectures. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 693–702.
[20]
Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2018. Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78(2018), 33–46.
[21]
Elizabeth D. Dolan and Jorge J. Moré. 2002. Benchmarking optimization software with performance profiles. Mathematical Programming 91, 2 (01 Jan 2002), 201–213.
[22]
Philip A Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, and Inderjit Dhillon. 2021. Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees. arXiv preprint arXiv:2106.02697(2021).
[23]
Linton C. Freeman. 1977. A Set of Measures of Centrality Based on Betweenness. Sociometry 40, 1 (1977), 35–41.
[24]
John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333–356.
[25]
John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2008. A Unified Framework for Numerical and Combinatorial Computing. Computing in Science Engineering 10, 2 (2008), 20–25.
[26]
Andreas Griewank and Uwe Naumann. 2003. Accumulating Jacobians as chained sparse matrix products. Mathematical Programming 95, 3 (01 Mar 2003), 555–571.
[27]
Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4, 3 (1978), 250–269.
[28]
Donald E. Knuth. 1998. The Art of Computer Programming, Volume 3: (2nd Ed.) Sorting and Searching. Addison Wesley Longman Publishing Co., Inc., USA.
[29]
Hochan Lee, David Wong, Loc Hoang, Roshan Dathathri, Gurbinder Gill, Vishwesh Jatala, David Kuck, and Keshav Pingali. 2020. A Study of APIs for Graph Analytics Workloads. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 228–239.
[30]
W. Liu and B. Vinter. 2014. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data. In IEEE 28th International Parallel and Distributed Processing Symposium. 370–381.
[31]
Andrew Lumsdaine, Luke Dalessandro, Kevin Deweese, Jesun Firoz, and Scott McMillan. 2020. Triangle Counting with Cyclic Distributions. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–8.
[32]
Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45–74.
[33]
Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydın Buluç. 2019. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90(2019), 102545.
[34]
Yusuke Nagasaka, Akira Nukada, and Satoshi Matsuoka. 2017. High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. In 46th International Conference on Parallel Processing (ICPP). 101–110.
[35]
Md. Mostofa Ali Patwary 2015. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. In High Performance Computing, Julian M. Kunkeland Thomas Ludwig (Eds.). Springer International Publishing, Cham, 48–57.
[36]
Gerald Penn. 2006. Efficient Transitive Closure of Sparse Matrices over Closed Semirings. Theor. Comput. Sci. 354, 1 (March 2006), 72–81.
[37]
Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydın Buluç. 2020. Optimizing High Performance Markov Clustering for Pre-Exascale Architectures. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 116–126.
[38]
Stijn Van Dongen. 2008. Graph Clustering Via a Discrete Uncoupling Process. SIAM J. Matrix Anal. Appl. 30, 1 (2008), 121–141.
[39]
Michael M. Wolf, Mehmet Deveci, Jonathan W. Berry, Simon D. Hammond, and Sivasankaran Rajamanickam. 2017. Fast linear algebra-based triangle counting with KokkosKernels. In IEEE High Performance Extreme Computing Conference (HPEC). 1–7.
[40]
Carl Yang, Aydın Buluç, and John D Owens. 2018. Implementing push-pull efficiently in GraphBLAS. In Proceedings of the 47th International Conference on Parallel Processing. 1–11.
[41]
Carl Yang, Aydın Buluç, and John D Owens. 2022. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU. ACM Transactions on Mathematical Software (TOMS) 48, 1 (2022), 1–51.

Cited By

View all
  • (2024)To Tile or not to Tile, That is the Question2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00096(449-458)Online publication date: 27-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • US Department of Energy

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)503
  • Downloads (Last 6 weeks)91
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)To Tile or not to Tile, That is the Question2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00096(449-458)Online publication date: 27-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media