Abstract
SpMV is a very common algorithm in linear algebra, which is widely adopted by machine learning applications nowadays. Especially, fully-connected MLP layers dominate many SpMV tasks that play a critical role in diverse services, and therefore a large fraction of data center cycles are spent. Despite exploiting sparse matrix storage techniques such as CSR/CSC, SpMV still suffers from limited memory bandwidth during data transferring because of the architecture of modern computing systems. However, we find that both integer type and floating-point type data used in matrix-vector multiplications are handled plainly without any necessary pre-processing. We added compression and decompression pre-processing between the main memory and Last Level Cache (LLC) which may dramatically reduce the memory bandwidth consumption. Furthermore, we also observed that convergence speed in some typical scientific computation benchmarks will not be degraded when adopting compressed floating-point data instead of the original double type. Based on these discoveries, in this paper, we propose a simple yet effective compression approach that can be implemented in general computing architectures and HPC systems preferably. When adopting this technique, a performance improvement of 1.92x is made in the best case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
HPCG Ranking (2021). https://www.top500.org/lists/hpcg/2021/06/
Kourtis, K., Karakasis, V., Goumas, G., Koziris, N.: CSX: an extended compression format for SpMV on shared memory systems. SIGPLAN Not. 46, 8 (2011)
Ahmad, K., Sundar, H., Hall, M.: Data-driven mixed precision sparse matrix vector multiplication for GPUs. ACM Trans. TACO 16(4), 1–24 (2019)
Sakamoto, R., Kondo, M., Fujita, K., Ichimura, T., Nakajima, K.: The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings HPCAsia2020, pp. 199–206 (2020)
FUJITSU Processor A64FX Datasheet. https://www.fujitsu.com/downloads/SUPER/a64fx/a64fx_datasheet_en.pdf
Vazquez, F., Ortega, G., Fernandez, J.J., Garzon, E.M.: Improving the performance of the sparse matrix vector product with GPUs. In: Proceedings of the 10th IEEE ICCIT, ser. CIT, pp. 1146–1151 (2010)
Tang, W.T., et al.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation optimized schemes. In: Proceedings of the ICHPC (2013)
Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SpMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015)
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the ICS 2014, pp. 273–282 (2014)
Grigoras, P., Burovskiy, P., Hung, E., Luk, W.: Accelerating SpMV on FPGAs by compressing nonzero values. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67 (2015)
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the ICS 2015, pp. 339–350 (2015)
Bian, B., Huang, J., Dong, R., Liu, L., Wang, X.: CSR2: a new format for SIMD-accelerated SpMV. In: CCGRID, pp. 350–359 (2020)
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG Benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), Article no. 1 (2011)
Acknowledgment
First and foremost, we would like to sincerely thank the anonymous reviewers for their valuable comments. This work was supported, in part, by JST CREST Grant Number JPMJCR18K1, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, S., Ito, M., Yoshikawa, T., He, Y., Kondo, M. (2023). Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-29927-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)