research-article

Open access

Parallel Algorithms for Masked Sparse Matrix-Matrix Products

Authors:

Srđan Milaković,

Oguz Selvitopi,

Zoran Budimlić,

Aydin BulucAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 10, Pages 1 - 11

https://doi.org/10.1145/3545008.3545048

Published: 13 January 2023 Publication History

All formats PDF

Abstract

Computing the product of two sparse matrices (SpGEMM) is a fundamental operation in various combinatorial and graph algorithms as well as various bioinformatics and data analytics applications for computing inner-product similarities. For an important class of algorithms, only a subset of the output entries are needed, and the resulting operation is known as Masked SpGEMM since a subset of the output entries is considered to be “masked out”.

Existing algorithms for Masked SpGEMM usually do not consider mask as part of multiplication and either first compute a regular SpGEMM followed by masking, or perform a sparse inner product only for output elements that are not masked out. In this work, we investigate various novel algorithms and data structures for this rather challenging and important computation, and provide guidelines on how to design a fast Masked-SpGEMM for shared-memory architectures. Our evaluations show that factors such as matrix and mask density, mask structure and cache behavior play a vital role in attaining high performance for Masked SpGEMM. We evaluate our algorithms on a large number of real-world and synthetic matrices using several real-world benchmarks and show that our algorithms in most cases significantly outperform the state of the art for Masked SpGEMM implementations.

References

[1]

Ariful Azad, Mohsen Mahmoudi Aznaveh, Scott Beamer, Mark Blanco, Jinhao Chen, Luke D’Alessandro, Roshan Dathathri, Tim Davis, Kevin Deweese, Jesun Firoz, 2020. Evaluation of graph analytics frameworks using the GAP benchmark suite. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 216–227.

[2]

Ariful Azad, Grey Ballard, Aydın Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624–C651.

[3]

Ariful Azad, Aydin Buluç, and John Gilbert. 2015. Parallel triangle counting and enumeration using matrix algebra. In International Parallel and Distributed Processing Symposium Workshops. IEEE, 804–811.

Digital Library

[4]

Mohsen Aznaveh, Jinhao Chen, Timothy A Davis, Bálint Hegyi, Scott P Kolodziej, Timothy G Mattson, and Gábor Szárnyas. 2020. Parallel GraphBLAS with OpenMP. In Proceedings of the SIAM Workshop on Combinatorial Scientific Computing. SIAM, 138–148.

[5]

David A Bader, John Feo, John Gilbert, Jeremy Kepner, David Koester, Eugene Loh, Kamesh Madduri, Bill Mann, and Theresa Meuse. 2006. HPCS scalable synthetic compact applications 2: graph analysis. SSCA 2(2006), v2.

[6]

Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–10.

Digital Library

[7]

Nathan Bell, Steven Dalton, and Luke N. Olson. 2012. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods. SIAM Journal on Scientific Computing 34, 4 (2012), C123–C152.

Digital Library

[8]

Maciej Besta, Michał Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To push or to pull: On reducing communication and synchronization in graph computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 93–104.

Digital Library

[9]

Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25, 2 (2001), 163–177.

[10]

W. Briggs, V. Henson, and S. McCormick. 2000. A Multigrid Tutorial, Second Edition(second ed.). Society for Industrial and Applied Mathematics.

Digital Library

[11]

Aydin Buluc and John R Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1–11.

[12]

Aydin Buluç, Tim Mattson, Scott McMillan, José Moreira, and Carl Yang. 2017. Design of the GraphBLAS API for C. In International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 643–652.

[13]

Aydin Buluç and John R. Gilbert. 2012. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments. SIAM Journal on Scientific Computing 34, 4 (2012), C170–C191.

Digital Library

[14]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. 442–446.

[15]

Timothy A. Davis. 2018. Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–6.

[16]

Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Trans. Math. Softw. 45, 4, Article 44 (Dec. 2019), 25 pages.

Digital Library

[17]

Timothy A Davis. 2022. Algorithm 10xx: SuiteSparse:GraphBLAS: parallel graph algorithms in the language of sparse linear algebra. (2022). draft manuscript.

[18]

Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38, 1 (2011), 1–25.

Digital Library

[19]

Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2017. Performance-portable sparse matrix-matrix multiplication for many-core architectures. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 693–702.

[20]

Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2018. Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78(2018), 33–46.

Digital Library

[21]

Elizabeth D. Dolan and Jorge J. Moré. 2002. Benchmarking optimization software with performance profiles. Mathematical Programming 91, 2 (01 Jan 2002), 201–213.

[22]

Philip A Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, and Inderjit Dhillon. 2021. Accelerating Inference for Sparse Extreme Multi-Label Ranking Trees. arXiv preprint arXiv:2106.02697(2021).

[23]

Linton C. Freeman. 1977. A Set of Measures of Centrality Based on Betweenness. Sociometry 40, 1 (1977), 35–41.

[24]

John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333–356.

Digital Library

[25]

John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2008. A Unified Framework for Numerical and Combinatorial Computing. Computing in Science Engineering 10, 2 (2008), 20–25.

Digital Library

[26]

Andreas Griewank and Uwe Naumann. 2003. Accumulating Jacobians as chained sparse matrix products. Mathematical Programming 95, 3 (01 Mar 2003), 555–571.

[27]

Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4, 3 (1978), 250–269.

Digital Library

[28]

Donald E. Knuth. 1998. The Art of Computer Programming, Volume 3: (2nd Ed.) Sorting and Searching. Addison Wesley Longman Publishing Co., Inc., USA.

[29]

Hochan Lee, David Wong, Loc Hoang, Roshan Dathathri, Gurbinder Gill, Vishwesh Jatala, David Kuck, and Keshav Pingali. 2020. A Study of APIs for Graph Analytics Workloads. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 228–239.

[30]

W. Liu and B. Vinter. 2014. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data. In IEEE 28th International Parallel and Distributed Processing Symposium. 370–381.

[31]

Andrew Lumsdaine, Luke Dalessandro, Kevin Deweese, Jesun Firoz, and Scott McMillan. 2020. Triangle Counting with Cyclic Distributions. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–8.

[32]

Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the graph 500. Cray Users Group (CUG) 19 (2010), 45–74.

[33]

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydın Buluç. 2019. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90(2019), 102545.

Digital Library

[34]

Yusuke Nagasaka, Akira Nukada, and Satoshi Matsuoka. 2017. High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. In 46th International Conference on Parallel Processing (ICPP). 101–110.

[35]

Md. Mostofa Ali Patwary 2015. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. In High Performance Computing, Julian M. Kunkeland Thomas Ludwig (Eds.). Springer International Publishing, Cham, 48–57.

[36]

Gerald Penn. 2006. Efficient Transitive Closure of Sparse Matrices over Closed Semirings. Theor. Comput. Sci. 354, 1 (March 2006), 72–81.

Digital Library

[37]

Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydın Buluç. 2020. Optimizing High Performance Markov Clustering for Pre-Exascale Architectures. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 116–126.

[38]

Stijn Van Dongen. 2008. Graph Clustering Via a Discrete Uncoupling Process. SIAM J. Matrix Anal. Appl. 30, 1 (2008), 121–141.

[39]

Michael M. Wolf, Mehmet Deveci, Jonathan W. Berry, Simon D. Hammond, and Sivasankaran Rajamanickam. 2017. Fast linear algebra-based triangle counting with KokkosKernels. In IEEE High Performance Extreme Computing Conference (HPEC). 1–7.

[40]

Carl Yang, Aydın Buluç, and John D Owens. 2018. Implementing push-pull efficiently in GraphBLAS. In Proceedings of the 47th International Conference on Parallel Processing. 1–11.

Digital Library

[41]

Carl Yang, Aydın Buluç, and John D Owens. 2022. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU. ACM Transactions on Mathematical Software (TOMS) 48, 1 (2022), 1–51.

Digital Library

Cited By

Haan APopovici DSen KIancu CCheung A(2024)To Tile or not to Tile, That is the Question2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00096(449-458)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00096

Recommendations

Parallel algorithms for masked sparse matrix-matrix products
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Computing the product of two sparse matrices (SpGEMM) is a fundamental operation in various combinatorial and graph algorithms as well as various bioinformatics and data analytics applications for computing inner-product similarities. For an important ...
"Wide or tall" and "sparse matrix dense matrix" multiplications
HPC '11: Proceedings of the 19th High Performance Computing Symposia

Sparse matrix dense matrix (SMDM) multiplications are useful in block Krylov or block Lanczos methods. SMDM computations are AU, and VA, multiplication of a large sparse m x n matrix A by a matrix V of k rows of length m or a matrix U of k columns of ...
A note on computing matrix-vector products with generalized centrosymmetric (centrohermitian) matrices

To reduce the costs of computing matrix-vector product Ax related to a centrosymmetric matrix A as compared to the case of an arbitrary matrix A, two algorithms were proposed recently, one was designed by Melman [A. Melman, Symmetric centrosymmetric ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

US Department of Energy

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
933
Total Downloads

Downloads (Last 12 months)496
Downloads (Last 6 weeks)66

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Haan APopovici DSen KIancu CCheung A(2024)To Tile or not to Tile, That is the Question2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00096(449-458)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00096

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten