research-article

Open access

SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

Authors:

Logan Anderson,

Kirshanthan Sundararajah,

Artem Pelenitsyn,

Milind KulkarniAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA2

Article No.: 290, Pages 527 - 556

https://doi.org/10.1145/3689730

Published: 08 October 2024 Publication History

Abstract

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.

References

[1]

A. Abdelfattah, M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, Tz. Kolev, I. Masliah, and S. Tomov. 2016. High-performance Tensor Contractions for GPUs. Procedia Computer Science, 80 (2016), 108–118. issn:1877-0509 https://doi.org/10.1016/j.procs.2016.05.302 International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA

Digital Library

[2]

Willow Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. 2022. Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 269–285. isbn:9781450392655 https://doi.org/10.1145/3519939.3523442

Digital Library

[3]

A. Allam, J. Ramanujam, G. Baumgartner, and P. Sadayappan. 2006. Memory minimization for tensor contractions using integer linear programming. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2006.1639717

[4]

Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Daniel Cociorva Venkatesh Choppella, Xiaoyang Gao, Robert Harrison, Sandhya Krishnan Sriram Krishnamoorthy, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics, 104, 2 (2006), 211–228. https://doi.org/10.1080/00268970500275780

[5]

Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559

Digital Library

[6]

Aart J. C. Bik and Harry A. G. Wijshoff. 1993. Compilation Techniques for Sparse Matrix Computations. In Proceedings of the 7th International Conference on Supercomputing (ICS ’93). Association for Computing Machinery, New York, NY, USA. 416–424. isbn:089791600X https://doi.org/10.1145/165939.166023

Digital Library

[7]

Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon. 2018. Blocking Optimization Techniques for Sparse Tensor Computation. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 568–577. https://doi.org/10.1109/IPDPS.2018.00066

[8]

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, oct, 30 pages. https://doi.org/10.1145/3276493

Digital Library

[9]

D. Cociorva, Xiaoyang Gao, S. Krishnan, G. Baumgartner, Chi-Chung Lam, P. Sadayappan, and J. Ramanujam. 2003. Memory-Constrained Data Locality Optimization for Tensor Contractions. In Proceedings International Parallel and Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2003.1213121

[10]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663

Digital Library

[11]

Adhitha Dias, Logan Anderson, Kirshanthan Sundararajah, Artem Pelenitsyn, and Milind Kulkarni. 2024. SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring. https://doi.org/10.5281/zenodo.12764370

Digital Library

[12]

Adhitha Dias, Kirshanthan Sundararajah, Charitha Saumya, and Milind Kulkarni. 2022. SparseLNR: accelerating sparse tensor computations using loop nest restructuring. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 15, 14 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532386

Digital Library

[13]

Johnnie Gray and Stefanos Kourtis. 2021. Hyper-optimized tensor network contraction. Quantum, 5 (2021), March, 410. issn:2521-327X https://doi.org/10.22331/q-2021-03-15-410

[14]

Will Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, 30 (2017), 1024–1034.

[15]

Albert Hartono, Qingda Lu, Thomas Henretty, Sriram Krishnamoorthy, Huaijian Zhang, Gerald Baumgartner, David E. Bernholdt, Marcel Nooijen, Russell Pitzer, J. Ramanujam, and P. Sadayappan. 2009. Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry. The Journal of Physical Chemistry A, 113, 45 (2009), 12715–12723. https://doi.org/10.1021/jp9051215 19888780

[16]

So Hirato. 2003. Tensor Contraction Engine: Abstraction an Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories. The journal of physical chemistry. A, 107, 46 (2003), 9887–9897. https://doi.org/10.1021/jp034596z

[17]

Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, and Yida Wang. 2020. FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 71, 13 pages. isbn:9781728199986

[18]

Raghavendra Kanakagari and Edgar Solomonik. 2023. Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network. https://doi.org/10.48550/arXiv.2307.05740

[19]

Jinsung Kim, Aravind Sukumaran-Rajam, Vineeth Thumma, Sriram Krishnamoorthy, Ajay Panyala, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2019. A Code Generator for High-Performance Tensor Contractions on GPUs. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 85–95. https://doi.org/10.1109/CGO.2019.8661182

[20]

Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Piscataway, NJ, USA. 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894

[21]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901

Digital Library

[22]

Jean Kossaifi, Aran Khanna, Zachary Lipton, Tommaso Furlanello, and Anima Anandkumar. 2017. Tensor Contraction Layers for Parsimonious Deep Nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

[23]

Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A Relational Approach to the Compilation of Sparse Matrix Programs. In Proceedings of the Third International Euro-Par Conference on Parallel Processing (Euro-Par ’97). Springer-Verlag, Berlin, Heidelberg. 318–327. isbn:3540634401

Digital Library

[24]

Chi-Chung Lam, P. Sadayappan, and Rephael Wenger. 1997. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution. Parallel Process. Lett., 7 (1997), 157–168. https://api.semanticscholar.org/CorpusID:9440379

[25]

Alan LaMielle and Michelle Mills Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework.

[26]

Richard Lippmann, David Fried, Isaac Graf, Joshua Haines, Kristopher Kendall, David McClung, Dan Weber, Seth Webster, Daniel Wyschogrod, Robert Cunningham, and Marc Zissman. 2000. Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. 2 (2000), 02, 12 – 26 vol.2. isbn:0-7695-0490-6 https://doi.org/10.1109/DISCEX.2000.821506

[27]

Jiawen Liu, Dong Li, Roberto Gioiosa, and Jiajia Li. 2021. Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory. In Proceedings of the ACM International Conference on Supercomputing (ICS ’21). Association for Computing Machinery, New York, NY, USA. 190–202. isbn:9781450383356 https://doi.org/10.1145/3447818.3460355

Digital Library

[28]

Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, and Jiajia Li. 2021. Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory. Association for Computing Machinery, New York, NY, USA. 318–333. isbn:9781450382946 https://doi.org/10.1145/3437801.3441581

Digital Library

[29]

Igor L. an Shi Yaoyun Markov. 2008. Simulating Quantum Computation by Contracting Tensor Networks. SIAM J. Comput., 38, 3 (2008), 963–981. https://doi.org/10.1137/050644756

Digital Library

[30]

Thomas Nelson, Axel Rivera, Prasanna Balaprakash, Mary Hall, Paul D. Hovland, Elizabeth Jessup, and Boyana Norris. 2015. Generating Efficient Tensor Contractions for GPUs. In 2015 44th International Conference on Parallel Processing. 969–978. https://doi.org/10.1109/ICPP.2015.106

Digital Library

[31]

Md. Khaledur Rahman, Majedul Haque Sujon, and Ariful Azad. 2021. FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 256–266. https://doi.org/10.1109/IPDPS49936.2021.00034

[32]

Shi-Ju Ran, Emanuele Tirrito, Cheng Peng, Xi Chen, Gang Su, and Maciej Lewenstein. 2017. Review of tensor network contraction approaches. arXiv preprint arXiv:1708.09213.

[33]

Shi-Ju Ran, Emanuele Tirrito, Cheng Peng, Xi Chen, Luca Tagliacozzo, Gang Su, and Maciej Lewenstein. 2020. Tensor Network Contractions: Methods and Applications to Quantum Many-Body Systems. Springer Nature. https://doi.org/10.1007/978-3-030-34489-4

[34]

Luca Rossi, Nesreen K. Ahmed, Jennifer Neville, and Keith Henderson. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. 42, 1 (2015), issn:0098-3500 https://doi.org/10.1145/2740908

Digital Library

[35]

S.K. Sahoo, S. Krishnamoorthy, R. Panuganti, and P. Sadayappan. 2005. Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions. In SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. 13–13. https://doi.org/10.1109/SC.2005.35

Digital Library

[36]

Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, Nov., 30 pages. https://doi.org/10.1145/3428226

Digital Library

[37]

Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. issn:0098-3500 https://doi.org/10.1145/2049662.2049663

Digital Library

[38]

Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721

[39]

Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput., 53, C (2016), apr, 32–57. issn:0167-8191 https://doi.org/10.1016/j.parco.2016.02.004

Digital Library

[40]

Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27–38. https://doi.org/10.1109/LLVMHPC54804.2021.00009

[41]

Ledyard R. Tucker. 1966. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika, 31, 3 (1966), 279–311. https://doi.org/10.1007/BF02289464

[42]

Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 521–532. isbn:9781450334686 https://doi.org/10.1145/2737924.2738003

Digital Library

[43]

Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA. 660–678. isbn:9781450399180 https://doi.org/10.1145/3582016.3582047

Digital Library

[44]

Tong Zhou, Ruiqin Tian, Rizwan A. Ashraf, Roberto Gioiosa, Gokcen Kestor, and Vivek Sarkar. 2023. ReACT: Redundancy-Aware Code Generation for Tensor Expressions. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22). Association for Computing Machinery, New York, NY, USA. 1–13. isbn:9781450398688 https://doi.org/10.1145/3559009.3569685

Digital Library

Cited By

Kanakagiri RSolomonik EAgrawal KPetrank E(2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659985

Index Terms

SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. Context specific languages
      1. Domain specific languages

Recommendations

Compiler Support for Sparse Tensor Convolutions

This paper extends prior work on sparse tensor algebra compilers to generate asymptotically efficient code for tensor expressions with affine subscript expressions. Our technique enables compiler support for a wide range of sparse computations, including ...
A sparse iteration space transformation framework for sparse tensor algebra

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the ...
Autoscheduling for sparse tensor algebra with an asymptotic cost model
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA2

October 2024

2691 pages

EISSN:2475-1421

DOI:10.1145/3554319

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
107
Total Downloads

Downloads (Last 12 months)107
Downloads (Last 6 weeks)107

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kanakagiri RSolomonik EAgrawal KPetrank E(2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659985

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents