Shi Z, Zou Y, Song X, Li S, Liu F and Xue Q. DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV. IEEE Transactions on Parallel and Distributed Systems. 10.1109/TPDS.2024.3488053. 35:12. (2624-2639).

https://ieeexplore.ieee.org/document/10738209/

Chen Y and Yu J. Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores. Proceedings of the 53rd International Conference on Parallel Processing. (1135-1144).

Hong C, Wang Q, Mao R, Liang Y, Xia R and Liu J. SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core Processors. Proceedings of the 53rd International Conference on Parallel Processing. (1166-1175).

https://doi.org/10.1145/3673038.3673054

Guo J, Xia R, Liu J, Zhu X and Zhang X. CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU. Proceedings of the 53rd International Conference on Parallel Processing. (640-649).

https://doi.org/10.1145/3673038.3673042

Xu L, Jia H, Zhang Y, Wang L and Jiang X. HAM-SpMSpV: an Optimized Parallel Algorithm for Masked Sparse Matrix-Sparse Vector Multiplications on multi-core CPUs. Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing. (160-173).

https://doi.org/10.1145/3625549.3658680

Chen Y and Yu J. (2024). Accelerating SpMV for Scale-Free Graphs with Optimized Bins 2024 IEEE 40th International Conference on Data Engineering (ICDE). 10.1109/ICDE60146.2024.00190. 979-8-3503-1715-2. (2407-2420).

https://ieeexplore.ieee.org/document/10598036/

Xiao G, Yin C, Zhou T, Li X, Chen Y and Li K. (2023). A Survey of Accelerating Parallel Sparse Linear Algebra. ACM Computing Surveys. 56:1. (1-38). Online publication date: 31-Jan-2024.

https://doi.org/10.1145/3604606

Scheffler P, Zaruba F, Schuiki F, Hoefler T and Benini L. (2023). Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra. IEEE Transactions on Parallel and Distributed Systems. 34:12. (3147-3161). Online publication date: 1-Dec-2023.

https://doi.org/10.1109/TPDS.2023.3322029

Lu Y and Liu W. DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. (1-14).

https://doi.org/10.1145/3581784.3607051

Fu G, Xia T, Qu S, Luo Z, Li S, Cheng P, Guo R, Ding Y and Ren P. (2023). PrSpMV: An Efficient Predictable Kernel for SpMV 2023 IEEE 41st International Conference on Computer Design (ICCD). 10.1109/ICCD58817.2023.00075. 979-8-3503-4291-8. (448-456).

https://ieeexplore.ieee.org/document/10361035/

Bi D, Li S, Zhang Y, Yang X and Dong D. Efficiently Running SpMV on Multi-core DSPs for Banded Matrix. Algorithms and Architectures for Parallel Processing. (201-220).

https://doi.org/10.1007/978-981-97-0808-6_12

Guo J, Liu J, Wang Q and Zhu X. Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s. Algorithms and Architectures for Parallel Processing. (22-39).

https://doi.org/10.1007/978-981-97-0801-7_2

Chen Y and Chung Y. Connectivity-Aware Link Analysis for Skewed Graphs. Proceedings of the 52nd International Conference on Parallel Processing. (482-491).

https://doi.org/10.1145/3605573.3605579

Jiang J, Huang J and Bian H. GTLB:A Load-Balanced SpMV Computation Method on GPU. Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications. (101-107).

https://doi.org/10.1145/3606043.3606057

Mpakos P, Galanopoulos D, Anastasiadis P, Papadopoulou N, Koziris N and Goumas G. (2023). Feature-based SpMV Performance Analysis on Contemporary Devices 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS54959.2023.00072. 979-8-3503-3766-2. (668-679).

https://ieeexplore.ieee.org/document/10177423/

Yesil S, Heidarshenas A, Morrison A and Torrellas J. WISE. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. (329-341).

https://doi.org/10.1145/3572848.3577506

Xia T, Fu G, Li C, Luo Z, Zhang L, Chen R, Zhao W, Zheng N and Ren P. A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization. IEEE Transactions on Parallel and Distributed Systems. 10.1109/TPDS.2022.3225230. 34:2. (519-534).

https://ieeexplore.ieee.org/document/9964419/

Bian H, Huang J, Tang J, Dong R, Wu L and Wang X. (2021). PAS: A new powerful and simple quantum computing simulator. Software: Practice and Experience. 10.1002/spe.3049. 53:1. (142-159). Online publication date: 1-Jan-2023.

https://onlinelibrary.wiley.com/doi/10.1002/spe.3049

Cheshmi K, Cetinic Z and Dehnavi M. (2022). Vectorizing Sparse Matrix Computations with Partially-Strided Codelets SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. 10.1109/SC41404.2022.00037. 978-1-6654-5444-5. (1-15).

https://ieeexplore.ieee.org/document/10046127/

Chou S and Amarasinghe S. (2022). Compilation of dynamic sparse tensor algebra. Proceedings of the ACM on Programming Languages. 6:OOPSLA2. (1408-1437). Online publication date: 31-Oct-2022.

https://doi.org/10.1145/3563338

Chen S, Fang J, Xu C and Wang Z. (2022). Adaptive Hybrid Storage Format for Sparse Matrix–Vector Multiplication on Multi-Core SIMD CPUs. Applied Sciences. 10.3390/app12199812. 12:19. (9812).

https://www.mdpi.com/2076-3417/12/19/9812

Vandierendonck H. Software-defined floating-point number formats and their application to graph processing. Proceedings of the 36th ACM International Conference on Supercomputing. (1-17).

https://doi.org/10.1145/3524059.3532360

Zhang Y, Yang W, Li K and Cai Q. (2022). Performance Optimization for Parallel SpMV on a NUMA Architecture. Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. 10.1007/978-3-030-89698-0_131. (1276-1288).

https://link.springer.com/10.1007/978-3-030-89698-0_131

Li C, Xia T, Zhao W, Zheng N and Ren P. SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SpMV. 2021 58th ACM/IEEE Design Automation Conference (DAC). (661-666).

https://doi.org/10.1109/DAC18074.2021.9586251

Bian H, Huang J, Dong R, Guo Y, Liu L, Huang D and Wang X. (2021). A simple and efficient storage format for SIMD-accelerated SpMV. Cluster Computing. 24:4. (3431-3448). Online publication date: 1-Dec-2021.

https://doi.org/10.1007/s10586-021-03340-1

Cui J, Lu K and Liu S. (2021). Sparse Matrix-Vector Multiplication Cache Performance Evaluation and Design Exploration 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 10.1109/MASCOTS53633.2021.9614301. 978-1-6654-5838-2. (1-7).

https://ieeexplore.ieee.org/document/9614301/

Fei X and Zhang Y. Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement. Proceedings of the 50th International Conference on Parallel Processing. (1-11).

https://doi.org/10.1145/3472456.3472479

Namashivayam N, Mehta S and Yew P. Variable-sized blocks for locality-aware SpMV. Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization. (211-221).

https://doi.org/10.1109/CGO51591.2021.9370327

Scheffler P, Zaruba F, Schuiki F, Hoefler T and Benini L. (2021). Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). 10.23919/DATE51398.2021.9474230. 978-3-9819263-5-4. (1787-1792).

https://ieeexplore.ieee.org/document/9474230/

Xie X, Liang Z, Gu P, Basak A, Deng L, Liang L, Hu X and Xie Y. (2021). SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00055. 978-1-6654-2235-2. (570-583).

https://ieeexplore.ieee.org/document/9407163/

Yesil S, Heidarshenas A, Morrison A and Torrellas J. Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. (1-15).

/doi/10.5555/3433701.3433815

Yesil S, Heidarshenas A, Morrison A and Torrellas J. (2020). Speeding Up SpMV for Power-Law Graph Analytics by Enhancing Locality & Vectorization SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 10.1109/SC41405.2020.00090. 978-1-7281-9998-6. (1-15).

https://ieeexplore.ieee.org/document/9355205/

Sun Q, Liu Y, Dun M, Yang H, Luan Z, Gan L, Yang G and Qian D. (2020). SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 10.1109/SC41405.2020.00022. 978-1-7281-9998-6. (1-14).

https://ieeexplore.ieee.org/document/9355324/

Vandierendonck H. Graptor. Proceedings of the 34th ACM International Conference on Supercomputing. (1-13).

https://doi.org/10.1145/3392717.3392753

Chou S, Kjolstad F and Amarasinghe S. Automatic generation of efficient sparse tensor format conversion routines. Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. (823-838).

https://doi.org/10.1145/3385412.3385963

Bian H, Huang J, Dong R, Liu L and Wang X. (2020). CSR2: A New Format for SIMD-accelerated SpMV 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 10.1109/CCGrid49817.2020.00-58. 978-1-7281-6095-5. (350-359).

https://ieeexplore.ieee.org/document/9139720/

Lee J, Kang S, Yu Y, Jo Y, Kim S and Park Y. (2020). Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks 2020 IEEE 36th International Conference on Data Engineering (ICDE). 10.1109/ICDE48307.2020.00085. 978-1-7281-2903-7. (925-936).

https://ieeexplore.ieee.org/document/9101654/

Dong X, Liu L, Zhao P, Li G, Li J, Wang X and Feng X. Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. (178-191).

https://doi.org/10.1109/PACT.2019.00022

Page B and Kogge P. (2019). Scalability of Hybrid SpMV on Intel Xeon Phi Knights Landing 2019 International Conference on High Performance Computing & Simulation (HPCS). 10.1109/HPCS48598.2019.9188154. 978-1-7281-4484-9. (348-357).

https://ieeexplore.ieee.org/document/9188154/

Ma Y, Li J, Wu X, Yan C, Sun J and Vuduc R. (2019). Optimizing sparse tensor times matrix on GPUs. Journal of Parallel and Distributed Computing. 129:C. (99-109). Online publication date: 1-Jul-2019.

https://doi.org/10.1016/j.jpdc.2018.07.018

Hong C, Sukumaran-Rajam A, Nisa I, Singh K and Sadayappan P. Adaptive sparse tiling for sparse matrix multiplication. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. (300-314).

https://doi.org/10.1145/3293883.3295712

Tan G, Liu J and Li J. (2018). Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture. ACM Transactions on Mathematical Software. 44:4. (1-25). Online publication date: 31-Dec-2019.

https://doi.org/10.1145/3218823

Xie B, Jia Z and Bao Y. Benchmarking SpMV Methods on Many-Core Platforms. Benchmarking, Measuring, and Optimizing. (233-247).

https://doi.org/10.1007/978-3-030-32813-9_19

Peng Z, Powell A, Wu B, Bicer T and Ren B. Graphphi. Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. (1-14).

https://doi.org/10.1145/3243176.3243205

Gao W, Zhan J, Wang L, Luo C, Zheng D, Tang F, Xie B, Zheng C, Wen X, He X, Ye H and Ren R. Data motifs. Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. (1-14).

https://doi.org/10.1145/3243176.3243190

Liu C, Xie B, Liu X, Xue W, Yang H and Liu X. Towards Efficient SpMV on Sunway Manycore Architectures. Proceedings of the 2018 International Conference on Supercomputing. (363-373).

https://doi.org/10.1145/3205289.3205313