research-article

Open access

Compiler Support for Sparse Tensor Convolutions

Authors:

Alexander J Root,

Fredrik Kjolstad,

Aart J.C. BikAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA2

Article No.: 281, Pages 275 - 303

https://doi.org/10.1145/3689721

Published: 08 October 2024 Publication History

Abstract

This paper extends prior work on sparse tensor algebra compilers to generate asymptotically efficient code for tensor expressions with affine subscript expressions. Our technique enables compiler support for a wide range of sparse computations, including sparse convolutions and pooling that are widely used in ML and graphics applications. We propose an approach that gradually rewrites compound subscript expressions to simple subscript expressions with loops that exploit the sparsity pattern of the input sparse tensors. As a result, the time complexity of the generated kernels is bounded by the number of stored elements and not by the shape of the tensors. Our approach seamlessly integrates into existing frameworks and is compatible with recent advances in compilers for sparse computations, including the flexibility to efficiently handle arbitrary combinations of different sparse tensor formats. The implementation of our algorithm is open source and upstreamed to the MLIR sparse compiler. Experimental results show that our method achieves 19.5x speedup when compared with the state-of-the-art compiler-based method at 99.9% sparsity. The generated sparse kernels start to outperform dense convolution implementations at about 80% sparsity.

References

[1]

2021. https://reviews.llvm.org/D109783

[2]

Willow Ahrens, Daniel Donenfeld, Fredrik Kjolstad, and Saman Amarasinghe. 2023. Looplets: A language for structured coiteration. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. 41–54.

Digital Library

[3]

Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559

Digital Library

[4]

Aart J.C. Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Department of Computer Science, Leiden University. ISBN 90-9009442-3

[5]

Aart J.C. Bik, Peter J.H. Brinkhaus, Peter M.W. Knijnenburg, and Harry A.G. Wijshoff. 1998. The Automatic Generation of Sparse Primitives. Transactions on Mathematical Software, 24 (1998), 190–225.

Digital Library

[6]

Aart J.C. Bik and Harry A.G. Wijshoff. 1995. Advanced Compiler Optimizations for Sparse Computations. J. Parallel and Distrib. Comput., 31 (1995), 14–24.

Digital Library

[7]

Aart J.C. Bik and Harry A.G. Wijshoff. 1996. Automatic Data Structure Selection and Transformation for Sparse Matrix Computations. IEEE Transactions on Parallel and Distributed Systems, 7, 2 (1996), 109–126.

Digital Library

[8]

Giovanni Capobianco, Carmine Cerrone, Andrea Di Placido, Daniel Durand, Luigi Pavone, Davide Russo, and Fabio Sebastiano. 2021. Image convolution: a linear programming approach for filters design. Soft Computing, 25 (2021), 07, https://doi.org/10.1007/s00500-021-05783-5

Digital Library

[9]

Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages. issn:2475-1421 https://doi.org/10.1145/3276493

Digital Library

[10]

Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC ’08). IEEE Press, Article 4, 12 pages. isbn:9781424428359

Digital Library

[11]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software, 38, 1 (2011).

Digital Library

[12]

John Denker, W. Gardner, Hans Graf, Donnie Henderson, R. Howard, W. Hubbard, L. D. Jackel, Henry Baird, and Isabelle Guyon. 1988. Neural Network Recognizer for Hand-Written Zip Code Digits. In Advances in Neural Information Processing Systems, D. Touretzky (Ed.). 1, Morgan-Kaufmann. https://proceedings.neurips.cc/paper_files/paper/1988/file/a97da629b098b75c294dffdc3e463904-Paper.pdf

[13]

Mohammad Mahdi Salehi Dezfuli and Kazem Cheshmi. 2024. Improving Locality in Sparse and Dense Matrix Multiplications. arXiv preprint arXiv:2407.00243.

[14]

Iain S. Duff, A.M. Erisman, and J.K. Reid. 1990. Direct Methods for Sparse Matrices. Oxford Science Publications, Oxford.

[15]

Mathias Eitz, James Hays, and Marc Alexa. 2012. How Do Humans Sketch Objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31, 4 (2012), 44:1–44:10.

Digital Library

[16]

Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2019. Fast Sparse ConvNets. CoRR, abs/1911.09723 (2019), arXiv:1911.09723. arxiv:1911.09723

[17]

William Fedus, Jeff Dean, and Barret Zoph. 2022. A review of sparse expert models in deep learning. arXiv preprint arXiv:2209.01667.

[18]

Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th annual international conference on Supercomputing. 361–366.

Digital Library

[19]

Kunihiko Fukushima. 1969. Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics, 5, 4 (1969), 322–333.

[20]

Kunihiko Fukushima. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36, 4 (1980), 193–202.

[21]

Alan George and Joseph W.H. Liu. 1981. Computer Solution of Large Sparse Positive Definite Systems. Prentice Hall, Englewood Cliffs, New York.

[22]

Zhangxiaowen Gong, Houxiang Ji, Christopher W Fletcher, Christopher J Hughes, and Josep Torrellas. 2020. SparseTrain: Leveraging dynamic sparsity in software for training DNNs on general-purpose SIMD processors. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 279–292.

Digital Library

[23]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[25]

Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of sparse array programming models. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–29.

Digital Library

[26]

Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 160–171.

Digital Library

[27]

Kevin Hunter, Lawrence Spracklen, and Subutai Ahmad. 2022. Two sparsities are better than one: unlocking the performance benefits of sparse–sparse networks. Neuromorphic Computing and Engineering, 2, 3 (2022), jul, 034004. https://doi.org/10.1088/2634-4386/ac7c8a

[28]

Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph. D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.

[29]

Fredrik Kjolstad. 2017. TACO: The Tensor Algebra Compiler. Open-source project available at http://tensor-compiler.org/

[30]

Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894

[31]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901

Digital Library

[32]

Fredrik Berg Kjølstad. 2020. Sparse tensor algebra compilation. Ph. D. Dissertation. Massachusetts Institute of Technology.

[33]

Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In European Conference on Parallel Processing. 318–327.

[34]

Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2007. Effective automatic parallelization of stencil computations. ACM sigplan notices, 42, 6 (2007), 235–244.

[35]

Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Bill Nell, Nir Shavit, and Dan Alistarh. 2020. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, Virtual. 5533–5543. http://proceedings.mlr.press/v119/kurtz20a.html

[36]

Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore’s Law. CoRR, abs/2002.11054 (2020), arxiv:2002.11054. arxiv:2002.11054

[37]

Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, and Amir Yazdanbakhsh. 2024. JaxPruner: A concise library for sparsity research. In Conference on Parsimony and Learning. 515–528.

[38]

Sheng R. Li, Jongsoo Park, and Ping Tak Peter Tang. 2017. Enabling Sparse Winograd Convolution by Native Pruning. CoRR, abs/1702.08597 (2017), arXiv:1702.08597. arxiv:1702.08597

[39]

Nikolay Mateev, Keshav Pingali, Paul Stodghill, and Vladimir Kotlyar. 2000. Next-generation generic programming and its application to sparse matrix computations. In Proceedings of the 14th international conference on Supercomputing. 88–99.

Digital Library

[40]

Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, and Mehrdad Farajtabar. 2023. ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models. arxiv:2310.04564.

[41]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.

Digital Library

[42]

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907–922.

Digital Library

[43]

Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. In International Conference on Learning Representations. https://openreview.net/forum?id=rJPcZ3txx

[44]

Sergio Pissanetsky. 1984. Sparse Matrix Technology. Academic Press, London.

[45]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG), 31, 4 (2012), 1–12.

Digital Library

[46]

Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Workshop on Irregular Applications: Architectures and Algorithms. 1–7. https://doi.org/10.1145/2833179.2833183

Digital Library

[47]

Haldo Spontón and Juan Cardelino. 2015. A Review of Classic Edge Detectors. Image Processing On Line, 5 (2015), 90–123. https://doi.org/10.5201/ipol.2015.35

[48]

Michelle Mills Strout. 2013. Compilers for Regular and Irregular Stencils: Some Shared Problems and Solutions. In Proceedings of Workshop on Optimizing Stencil Computations (WOSC).

[49]

Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934.

[50]

Yuan Tang, Rezaul Alam Chowdhury, Bradley C Kuszmaul, Chi-Keung Luk, and Charles E Leiserson. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117–128.

Digital Library

[51]

Reginal P. Tewarson. 1973. Sparse Matrices. Academic Press, New York, NY.

[52]

Manuel Ujaldon, Emilio L. Zapata, Barbara M. Chapman, and Hans P. Zima. 1997. Vienna-Fortran/HPF extensions for sparse and irregular problems and their compilation. IEEE Transactions on Parallel and Distributed Systems, 8, 10 (1997), 1068–1083.

Digital Library

[53]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, abs/1802.04730 (2018), arxiv:1802.04730. arxiv:1802.04730

[54]

A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang. 1989. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 3 (1989), 328–339. https://doi.org/10.1109/29.21701

[55]

Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020.

Digital Library

[56]

Jaeyeon Won, Changwan Hong, Charith Mendis, Joel Emer, and Saman Amarasinghe. 2023. Unified Convolution Framework: A compiler-based approach to support sparse convolutions. Proceedings of Machine Learning and Systems, 5 (2023).

[57]

Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 660–678.

Digital Library

[58]

Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. 2022. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems, 35 (2022), 18309–18326.

[59]

Zahari Zlatev. 1991. Computational Methods for General Sparse Matrices. Kluwer Academic Publishers, Dordrecht.

Index Terms

Compiler Support for Sparse Tensor Convolutions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. Context specific languages
      1. Domain specific languages

Recommendations

The tensor algebra compiler

Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save ...
Compiler Support for Sparse Tensor Computations in MLIR
Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by ...
Tensor Iterators for Flexible High-Performance Tensor Computation
Languages and Compilers for Parallel Computing
Abstract
The explosive growth of machine learning applications has consequently created a demand for high-performance implementations of tensor contractions, both for dense and sparse tensors. Compilers, code generators and libraries are often limited in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA2

October 2024

2691 pages

EISSN:2475-1421

DOI:10.1145/3554319

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Evaluated & Functional / v1.1

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
127
Total Downloads

Downloads (Last 12 months)127
Downloads (Last 6 weeks)127

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents