Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Compiler Support for Sparse Tensor Convolutions

Published: 08 October 2024 Publication History

Abstract

This paper extends prior work on sparse tensor algebra compilers to generate asymptotically efficient code for tensor expressions with affine subscript expressions. Our technique enables compiler support for a wide range of sparse computations, including sparse convolutions and pooling that are widely used in ML and graphics applications. We propose an approach that gradually rewrites compound subscript expressions to simple subscript expressions with loops that exploit the sparsity pattern of the input sparse tensors. As a result, the time complexity of the generated kernels is bounded by the number of stored elements and not by the shape of the tensors. Our approach seamlessly integrates into existing frameworks and is compatible with recent advances in compilers for sparse computations, including the flexibility to efficiently handle arbitrary combinations of different sparse tensor formats. The implementation of our algorithm is open source and upstreamed to the MLIR sparse compiler. Experimental results show that our method achieves 19.5x speedup when compared with the state-of-the-art compiler-based method at 99.9% sparsity. The generated sparse kernels start to outperform dense convolution implementations at about 80% sparsity.

References

[1]
2021. https://reviews.llvm.org/D109783
[2]
Willow Ahrens, Daniel Donenfeld, Fredrik Kjolstad, and Saman Amarasinghe. 2023. Looplets: A language for structured coiteration. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. 41–54.
[3]
Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559
[4]
Aart J.C. Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Department of Computer Science, Leiden University. ISBN 90-9009442-3
[5]
Aart J.C. Bik, Peter J.H. Brinkhaus, Peter M.W. Knijnenburg, and Harry A.G. Wijshoff. 1998. The Automatic Generation of Sparse Primitives. Transactions on Mathematical Software, 24 (1998), 190–225.
[6]
Aart J.C. Bik and Harry A.G. Wijshoff. 1995. Advanced Compiler Optimizations for Sparse Computations. J. Parallel and Distrib. Comput., 31 (1995), 14–24.
[7]
Aart J.C. Bik and Harry A.G. Wijshoff. 1996. Automatic Data Structure Selection and Transformation for Sparse Matrix Computations. IEEE Transactions on Parallel and Distributed Systems, 7, 2 (1996), 109–126.
[8]
Giovanni Capobianco, Carmine Cerrone, Andrea Di Placido, Daniel Durand, Luigi Pavone, Davide Russo, and Fabio Sebastiano. 2021. Image convolution: a linear programming approach for filters design. Soft Computing, 25 (2021), 07, https://doi.org/10.1007/s00500-021-05783-5
[9]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages. issn:2475-1421 https://doi.org/10.1145/3276493
[10]
Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC ’08). IEEE Press, Article 4, 12 pages. isbn:9781424428359
[11]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software, 38, 1 (2011).
[12]
John Denker, W. Gardner, Hans Graf, Donnie Henderson, R. Howard, W. Hubbard, L. D. Jackel, Henry Baird, and Isabelle Guyon. 1988. Neural Network Recognizer for Hand-Written Zip Code Digits. In Advances in Neural Information Processing Systems, D. Touretzky (Ed.). 1, Morgan-Kaufmann. https://proceedings.neurips.cc/paper_files/paper/1988/file/a97da629b098b75c294dffdc3e463904-Paper.pdf
[13]
Mohammad Mahdi Salehi Dezfuli and Kazem Cheshmi. 2024. Improving Locality in Sparse and Dense Matrix Multiplications. arXiv preprint arXiv:2407.00243.
[14]
Iain S. Duff, A.M. Erisman, and J.K. Reid. 1990. Direct Methods for Sparse Matrices. Oxford Science Publications, Oxford.
[15]
Mathias Eitz, James Hays, and Marc Alexa. 2012. How Do Humans Sketch Objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31, 4 (2012), 44:1–44:10.
[16]
Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2019. Fast Sparse ConvNets. CoRR, abs/1911.09723 (2019), arXiv:1911.09723. arxiv:1911.09723
[17]
William Fedus, Jeff Dean, and Barret Zoph. 2022. A review of sparse expert models in deep learning. arXiv preprint arXiv:2209.01667.
[18]
Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations. In Proceedings of the 19th annual international conference on Supercomputing. 361–366.
[19]
Kunihiko Fukushima. 1969. Visual feature extraction by a multilayered network of analog threshold elements. IEEE Transactions on Systems Science and Cybernetics, 5, 4 (1969), 322–333.
[20]
Kunihiko Fukushima. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36, 4 (1980), 193–202.
[21]
Alan George and Joseph W.H. Liu. 1981. Computer Solution of Large Sparse Positive Definite Systems. Prentice Hall, Englewood Cliffs, New York.
[22]
Zhangxiaowen Gong, Houxiang Ji, Christopher W Fletcher, Christopher J Hughes, and Josep Torrellas. 2020. SparseTrain: Leveraging dynamic sparsity in software for training DNNs on general-purpose SIMD processors. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 279–292.
[23]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
[24]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[25]
Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of sparse array programming models. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–29.
[26]
Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 160–171.
[27]
Kevin Hunter, Lawrence Spracklen, and Subutai Ahmad. 2022. Two sparsities are better than one: unlocking the performance benefits of sparse–sparse networks. Neuromorphic Computing and Engineering, 2, 3 (2022), jul, 034004. https://doi.org/10.1088/2634-4386/ac7c8a
[28]
Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph. D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.
[29]
Fredrik Kjolstad. 2017. TACO: The Tensor Algebra Compiler. Open-source project available at http://tensor-compiler.org/
[30]
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894
[31]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901
[32]
Fredrik Berg Kjølstad. 2020. Sparse tensor algebra compilation. Ph. D. Dissertation. Massachusetts Institute of Technology.
[33]
Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In European Conference on Parallel Processing. 318–327.
[34]
Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2007. Effective automatic parallelization of stencil computations. ACM sigplan notices, 42, 6 (2007), 235–244.
[35]
Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Bill Nell, Nir Shavit, and Dan Alistarh. 2020. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, Virtual. 5533–5543. http://proceedings.mlr.press/v119/kurtz20a.html
[36]
Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore’s Law. CoRR, abs/2002.11054 (2020), arxiv:2002.11054. arxiv:2002.11054
[37]
Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, and Amir Yazdanbakhsh. 2024. JaxPruner: A concise library for sparsity research. In Conference on Parsimony and Learning. 515–528.
[38]
Sheng R. Li, Jongsoo Park, and Ping Tak Peter Tang. 2017. Enabling Sparse Winograd Convolution by Native Pruning. CoRR, abs/1702.08597 (2017), arXiv:1702.08597. arxiv:1702.08597
[39]
Nikolay Mateev, Keshav Pingali, Paul Stodghill, and Vladimir Kotlyar. 2000. Next-generation generic programming and its application to sparse matrix computations. In Proceedings of the 14th international conference on Supercomputing. 88–99.
[40]
Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, and Mehrdad Farajtabar. 2023. ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models. arxiv:2310.04564.
[41]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.
[42]
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907–922.
[43]
Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. In International Conference on Learning Representations. https://openreview.net/forum?id=rJPcZ3txx
[44]
Sergio Pissanetsky. 1984. Sparse Matrix Technology. Academic Press, London.
[45]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG), 31, 4 (2012), 1–12.
[46]
Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Workshop on Irregular Applications: Architectures and Algorithms. 1–7. https://doi.org/10.1145/2833179.2833183
[47]
Haldo Spontón and Juan Cardelino. 2015. A Review of Classic Edge Detectors. Image Processing On Line, 5 (2015), 90–123. https://doi.org/10.5201/ipol.2015.35
[48]
Michelle Mills Strout. 2013. Compilers for Regular and Irregular Stencils: Some Shared Problems and Solutions. In Proceedings of Workshop on Optimizing Stencil Computations (WOSC).
[49]
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934.
[50]
Yuan Tang, Rezaul Alam Chowdhury, Bradley C Kuszmaul, Chi-Keung Luk, and Charles E Leiserson. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures. 117–128.
[51]
Reginal P. Tewarson. 1973. Sparse Matrices. Academic Press, New York, NY.
[52]
Manuel Ujaldon, Emilio L. Zapata, Barbara M. Chapman, and Hans P. Zima. 1997. Vienna-Fortran/HPF extensions for sparse and irregular problems and their compilation. IEEE Transactions on Parallel and Distributed Systems, 8, 10 (1997), 1068–1083.
[53]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, abs/1802.04730 (2018), arxiv:1802.04730. arxiv:1802.04730
[54]
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang. 1989. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 3 (1989), 328–339. https://doi.org/10.1109/29.21701
[55]
Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020.
[56]
Jaeyeon Won, Changwan Hong, Charith Mendis, Joel Emer, and Saman Amarasinghe. 2023. Unified Convolution Framework: A compiler-based approach to support sparse convolutions. Proceedings of Machine Learning and Systems, 5 (2023).
[57]
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable abstractions for sparse compilation in deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 660–678.
[58]
Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. 2022. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems, 35 (2022), 18309–18326.
[59]
Zahari Zlatev. 1991. Computational Methods for General Sparse Matrices. Kluwer Academic Publishers, Dordrecht.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
October 2024
2691 pages
EISSN:2475-1421
DOI:10.1145/3554319
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024
Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. code generation
  2. convolution
  3. iteration graphs
  4. merge lattices
  5. performance
  6. sparse data structures
  7. sparse tensor algebra
  8. sparse tensors

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 127
    Total Downloads
  • Downloads (Last 12 months)127
  • Downloads (Last 6 weeks)127
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media