Abstract
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of nonuniform lengths. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix-matrix multiplication (SpGEMM) operations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We in this paper call “sub-array” “segment”, since each segment further includes at least one “sub-segment”. In this way, we can avoid using terms like “sub-array” and “sub-sub-array”.
References
Blelloch, G.E., Heroux, M.A., Zagha, M.: Segmented operations for sparse matrix computation on vector multiprocessors. Technical report, CMU (1993)
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1-1:25 (2011)
Deveci, M., Trott, C., Rajamanickam, S.: Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78, 33–46 (2018)
Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., Manferdelli, J.: Fast scan algorithms on graphics processors. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008, pp. 205–213 (2008)
Green, O., McColl, R., Bader, D.A.: GPU merge path: A GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 331–340 (2012)
Gremse, F., Küpper, K., Naumann, U.: Memory-efficient sparse matrix-matrix multiplication by row merging on many-core architectures. SIAM J. Sci. Comput. 40(4), C429–C449 (2018)
Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans. Math. Softw. 4(3), 250–269 (1978)
Hou, K., Liu, W., Wang, H., Feng, W.c.: Fast segmented sort on GPUs. In: Proceedings of the International Conference on Supercomputing, ICS 2017 (2017)
Liu, J., He, X., Liu, W., Tan, G.: Register-based implementation of the sparse general matrix-matrix multiplication on GPUs. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, pp. 407–408 (2018)
Liu, J., He, X., Liu, W., Tan, G.: Register-aware optimizations for parallel sparse matrix-matrix multiplication. Int. J. Parallel Program. 47, 403–417 (2019)
Liu, W., Vinter, B.: A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors. J. Parallel Distrib. Comput. 85, 47–61 (2015)
Liu, W., Vinter, B.: Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Comput. 49, 179–193 (2015)
Liu, W., Vinter, B.: CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350 (2015)
Nagasaka, Y., Nukada, A., Matsuoka, S.: High-performance and memory-saving sparse general matrix-matrix multiplication for NVIDIA pascal GPU. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 101–110 (2017)
Wang, H., Liu, W., Hou, K., Feng, W.C.: Parallel transposition of sparse data structures. In: Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, pp. 33:1–33:13 (2016)
Xie, Z., Tan, G., Liu, W., Sun, N.: IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In: Proceedings of the ACM International Conference on Supercomputing, ICS 2019, pp. 94–105 (2019)
Acknowledgments
We would like to thank the invaluable comments from all the reviewers. Weifeng Liu is the corresponding author of this paper. This research was supported by the Science Challenge Project under Grant No. TZZT2016002, the National Natural Science Foundation of China under Grant No. 61972415, and the Science Foundation of China University of Petroleum, Beijing under Grant No. 2462019YJRC004, 2462020XKJS03.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ji, H., Lu, S., Hou, K., Wang, H., Liu, W., Vinter, B. (2021). Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-79478-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)