Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

Published: 08 October 2024 Publication History

Abstract

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.

References

[1]
A. Abdelfattah, M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, Tz. Kolev, I. Masliah, and S. Tomov. 2016. High-performance Tensor Contractions for GPUs. Procedia Computer Science, 80 (2016), 108–118. issn:1877-0509 https://doi.org/10.1016/j.procs.2016.05.302 International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA
[2]
Willow Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. 2022. Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 269–285. isbn:9781450392655 https://doi.org/10.1145/3519939.3523442
[3]
A. Allam, J. Ramanujam, G. Baumgartner, and P. Sadayappan. 2006. Memory minimization for tensor contractions using integer linear programming. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2006.1639717
[4]
Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Daniel Cociorva Venkatesh Choppella, Xiaoyang Gao, Robert Harrison, Sandhya Krishnan Sriram Krishnamoorthy, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics, 104, 2 (2006), 211–228. https://doi.org/10.1080/00268970500275780
[5]
Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559
[6]
Aart J. C. Bik and Harry A. G. Wijshoff. 1993. Compilation Techniques for Sparse Matrix Computations. In Proceedings of the 7th International Conference on Supercomputing (ICS ’93). Association for Computing Machinery, New York, NY, USA. 416–424. isbn:089791600X https://doi.org/10.1145/165939.166023
[7]
Jee Choi, Xing Liu, Shaden Smith, and Tyler Simon. 2018. Blocking Optimization Techniques for Sparse Tensor Computation. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 568–577. https://doi.org/10.1109/IPDPS.2018.00066
[8]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, oct, 30 pages. https://doi.org/10.1145/3276493
[9]
D. Cociorva, Xiaoyang Gao, S. Krishnan, G. Baumgartner, Chi-Chung Lam, P. Sadayappan, and J. Ramanujam. 2003. Memory-Constrained Data Locality Optimization for Tensor Contractions. In Proceedings International Parallel and Distributed Processing Symposium. 8 pp.–. https://doi.org/10.1109/IPDPS.2003.1213121
[10]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663
[11]
Adhitha Dias, Logan Anderson, Kirshanthan Sundararajah, Artem Pelenitsyn, and Milind Kulkarni. 2024. SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring. https://doi.org/10.5281/zenodo.12764370
[12]
Adhitha Dias, Kirshanthan Sundararajah, Charitha Saumya, and Milind Kulkarni. 2022. SparseLNR: accelerating sparse tensor computations using loop nest restructuring. In Proceedings of the 36th ACM International Conference on Supercomputing (ICS ’22). Association for Computing Machinery, New York, NY, USA. Article 15, 14 pages. isbn:9781450392815 https://doi.org/10.1145/3524059.3532386
[13]
Johnnie Gray and Stefanos Kourtis. 2021. Hyper-optimized tensor network contraction. Quantum, 5 (2021), March, 410. issn:2521-327X https://doi.org/10.22331/q-2021-03-15-410
[14]
Will Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, 30 (2017), 1024–1034.
[15]
Albert Hartono, Qingda Lu, Thomas Henretty, Sriram Krishnamoorthy, Huaijian Zhang, Gerald Baumgartner, David E. Bernholdt, Marcel Nooijen, Russell Pitzer, J. Ramanujam, and P. Sadayappan. 2009. Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry. The Journal of Physical Chemistry A, 113, 45 (2009), 12715–12723. https://doi.org/10.1021/jp9051215 19888780
[16]
So Hirato. 2003. Tensor Contraction Engine: Abstraction an Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories. The journal of physical chemistry. A, 107, 46 (2003), 9887–9897. https://doi.org/10.1021/jp034596z
[17]
Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, and Yida Wang. 2020. FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 71, 13 pages. isbn:9781728199986
[18]
Raghavendra Kanakagari and Edgar Solomonik. 2023. Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network. https://doi.org/10.48550/arXiv.2307.05740
[19]
Jinsung Kim, Aravind Sukumaran-Rajam, Vineeth Thumma, Sriram Krishnamoorthy, Ajay Panyala, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2019. A Code Generator for High-Performance Tensor Contractions on GPUs. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 85–95. https://doi.org/10.1109/CGO.2019.8661182
[20]
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Piscataway, NJ, USA. 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894
[21]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901
[22]
Jean Kossaifi, Aran Khanna, Zachary Lipton, Tommaso Furlanello, and Anima Anandkumar. 2017. Tensor Contraction Layers for Parsimonious Deep Nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[23]
Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A Relational Approach to the Compilation of Sparse Matrix Programs. In Proceedings of the Third International Euro-Par Conference on Parallel Processing (Euro-Par ’97). Springer-Verlag, Berlin, Heidelberg. 318–327. isbn:3540634401
[24]
Chi-Chung Lam, P. Sadayappan, and Rephael Wenger. 1997. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution. Parallel Process. Lett., 7 (1997), 157–168. https://api.semanticscholar.org/CorpusID:9440379
[25]
Alan LaMielle and Michelle Mills Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework.
[26]
Richard Lippmann, David Fried, Isaac Graf, Joshua Haines, Kristopher Kendall, David McClung, Dan Weber, Seth Webster, Daniel Wyschogrod, Robert Cunningham, and Marc Zissman. 2000. Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. 2 (2000), 02, 12 – 26 vol.2. isbn:0-7695-0490-6 https://doi.org/10.1109/DISCEX.2000.821506
[27]
Jiawen Liu, Dong Li, Roberto Gioiosa, and Jiajia Li. 2021. Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory. In Proceedings of the ACM International Conference on Supercomputing (ICS ’21). Association for Computing Machinery, New York, NY, USA. 190–202. isbn:9781450383356 https://doi.org/10.1145/3447818.3460355
[28]
Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li, and Jiajia Li. 2021. Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory. Association for Computing Machinery, New York, NY, USA. 318–333. isbn:9781450382946 https://doi.org/10.1145/3437801.3441581
[29]
Igor L. an Shi Yaoyun Markov. 2008. Simulating Quantum Computation by Contracting Tensor Networks. SIAM J. Comput., 38, 3 (2008), 963–981. https://doi.org/10.1137/050644756
[30]
Thomas Nelson, Axel Rivera, Prasanna Balaprakash, Mary Hall, Paul D. Hovland, Elizabeth Jessup, and Boyana Norris. 2015. Generating Efficient Tensor Contractions for GPUs. In 2015 44th International Conference on Parallel Processing. 969–978. https://doi.org/10.1109/ICPP.2015.106
[31]
Md. Khaledur Rahman, Majedul Haque Sujon, and Ariful Azad. 2021. FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 256–266. https://doi.org/10.1109/IPDPS49936.2021.00034
[32]
Shi-Ju Ran, Emanuele Tirrito, Cheng Peng, Xi Chen, Gang Su, and Maciej Lewenstein. 2017. Review of tensor network contraction approaches. arXiv preprint arXiv:1708.09213.
[33]
Shi-Ju Ran, Emanuele Tirrito, Cheng Peng, Xi Chen, Luca Tagliacozzo, Gang Su, and Maciej Lewenstein. 2020. Tensor Network Contractions: Methods and Applications to Quantum Many-Body Systems. Springer Nature. https://doi.org/10.1007/978-3-030-34489-4
[34]
Luca Rossi, Nesreen K. Ahmed, Jennifer Neville, and Keith Henderson. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. 42, 1 (2015), issn:0098-3500 https://doi.org/10.1145/2740908
[35]
S.K. Sahoo, S. Krishnamoorthy, R. Panuganti, and P. Sadayappan. 2005. Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions. In SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. 13–13. https://doi.org/10.1109/SC.2005.35
[36]
Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, Nov., 30 pages. https://doi.org/10.1145/3428226
[37]
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. issn:0098-3500 https://doi.org/10.1145/2049662.2049663
[38]
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721
[39]
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput., 53, C (2016), apr, 32–57. issn:0167-8191 https://doi.org/10.1016/j.parco.2016.02.004
[40]
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27–38. https://doi.org/10.1109/LLVMHPC54804.2021.00009
[41]
Ledyard R. Tucker. 1966. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika, 31, 3 (1966), 279–311. https://doi.org/10.1007/BF02289464
[42]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 521–532. isbn:9781450334686 https://doi.org/10.1145/2737924.2738003
[43]
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA. 660–678. isbn:9781450399180 https://doi.org/10.1145/3582016.3582047
[44]
Tong Zhou, Ruiqin Tian, Rizwan A. Ashraf, Roberto Gioiosa, Gokcen Kestor, and Vivek Sarkar. 2023. ReACT: Redundancy-Aware Code Generation for Tensor Expressions. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22). Association for Computing Machinery, New York, NY, USA. 1–13. isbn:9781450398688 https://doi.org/10.1145/3559009.3569685

Cited By

View all
  • (2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024

Index Terms

  1. SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
      October 2024
      2691 pages
      EISSN:2475-1421
      DOI:10.1145/3554319
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 October 2024
      Published in PACMPL Volume 8, Issue OOPSLA2

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Asymptotic Analysis
      2. Automatic Scheduling
      3. Fusion
      4. Loop Transformations
      5. Sparse Tensor Algebra

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)107
      • Downloads (Last 6 weeks)107
      Reflects downloads up to 18 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media