Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3613424.3614303acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Published: 08 December 2023 Publication History

Abstract

Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g. implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9 ×, 3.3 ×, 2.2 × and 1.7 × measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3 × faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6 × faster inference speed compared with state-of-the-art graph deep learning libraries. Our code is publicly released at https://github.com/mit-han-lab/torchsparse.

References

[1]
Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, and Chiew-Lan Tai. 2022. TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In IEEE/CVF International Conference on Computer Vision (ICCV).
[3]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuScenes: A Multimodal Dataset for Autonomous Driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[5]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Advances in Neural Information Processing Systems (NeurIPS).
[6]
Xuesong Chen, Shaoshuai Shi, Benjin Zhu, Ka Chun Cheung, Hang Xu, and Hongsheng Li. 2022. MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection. In European Conference on Computer Vision (ECCV).
[7]
Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, and Jiaya Jia. 2022. Focal Sparse Convolutional Networks for 3D Object Detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Yukang Chen, Jianhui Liu, Xiaojuan Qi, Xiangyu Zhang, Jian Sun, and Jiaya Jia. 2023. Scaling up Kernels in 3D CNNs. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9]
Ran Cheng, Christopher Agia, Yuan Ren, Xinhai Li, and Bingbing Liu. 2020. S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds. In Conference on Robot Learning(CoRL).
[10]
Ran Cheng, Ryan Razani, Ehsan Taghavi, Enxu Li, and Bingbing Liu. 2021. (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In Advances in Neural Information Systems (NeurIPS) Workshops.
[12]
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13]
Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, and Tianqi Chen. 2023. TensorIR: An Abstraction for Automatic Tensorized Program Optimization. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[14]
Yu Feng, Gunnar Hammonds, Yiming Gan, and Yuhao Zhu. 2022. Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics. In International Symposium on Computer Architecture (ISCA). 962–977.
[15]
Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, and Yuhao Zhu. 2020. Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation. In IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1037–1050.
[16]
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
[17]
Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Wenxin Shao, Li Huang, Kun Li, and Qiang Liu. 2021. 1st Place Solutions to the Real-time 3D Detection and the Most Efficient Model of the Waymo Open Dataset Challenge 2021. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[18]
Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems (NeurIPS).
[20]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Ke Hong, Zhongming Yu, Guohao Dai, Xinhao Yang, Yaoxiu Lian, Zehao Liu, Ningyi Xu, and Yu Wang. 2023. Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds. In Conference on Machine Learning and Systems (MLSys).
[22]
Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian, and Toshihiko Yamasaki. 2022. Green Hierarchical Vision Transformer for Masked Image Modeling. In Advances in Neural Information Processing Systems (NeurIPS).
[23]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In ACM Multimedia.
[24]
Andrew Kerr, Haicheng Wu, Manish Gupta, Dustyn Blasig, Pradeep Ramini, 2022. CUTLASS: CUDA Template Library for Linear Algebra Subroutines. https://github.com/NVIDIA/CUTLASS.
[25]
Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, and Jiong Yang. 2019. PointPillars: Fast Encoders for Object Detection from Point Clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26]
Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2017. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, and Jiaya Jia. 2022. Unifying Voxel-based Representation with Transformer for 3D Object Detection. In Advances in Neural Information Processing Systems (NeurIPS).
[28]
Yujun Lin, Zhekai Zhang, Haotian Tang, Hanrui Wang, and Song Han. 2021. PointAcc: Efficient Point Cloud Accelerator. In IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29]
Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, and Daniela Rus. 2021. Efficient and Robust LiDAR-Based End-to-End Navigation. In IEEE International Conference on Robotics and Automation (ICRA).
[30]
Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. 2023. BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. In IEEE International Conference on Robotics and Automation (ICRA).
[31]
Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, and Song Han. 2021. PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2021).
[32]
Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, and Song Han. 2023. FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, and Cewu Lu. 2018. Recurrent Residual Module for Fast Inference in Videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34]
Charles R Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa Vo, Boyang Deng, and Dragomir Anguelov. 2021. Offboard 3D Object Detection from Point Cloud Sequences. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
[36]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling Relational Data with Graph Convolutional Networks. In The Extended Semantic Web Conference (ESWC).
[37]
Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, and Yida Wang. 2021. Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference. In Conference on Machine Learning and Systems (MLSys).
[38]
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. 2020. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39]
Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, and Dragomir Anguelov. 2022. SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds. In European Conference on Computer Vision (ECCV).
[40]
Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, and Song Han. 2022. TorchSparse: Efficient Point Cloud Inference Engine. In Conference on Machine Learning and Systems (MLSys).
[41]
Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In European Conference on Computer Vision (ECCV).
[42]
Keyu Tian, Yi Jiang, Qishuai Diao, Chen Lin, Liwei Wang, and Zehuan Yuan. 2023. Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling. In International Conference on Learning Representations (ICLR).
[43]
Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernet Schiele, and Liwei Wang. 2023. DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Lin, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).
[45]
Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, and Jingwen Leng. 2021. Dual-Side Sparse Tensor Core. In IEEE/ACM International Symposium on Computer Architecture (ISCA).
[46]
Zhiqiang Xie, Minjie Wang, Zihao Ye, Zheng Zhang, and Rui Fan. 2022. Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph. In Conference on Machine Learning and Systems (MLSys).
[47]
Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, and Yibo Zhu. 2022. Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance. In Conference on Machine Learning and Systems (MLSys).
[48]
Yan Yan. 2023. CUMM: CUda Matrix Multiply library. https://github.com/FindDefinition/cumm.
[49]
Yan Yan. 2023. SpConv v2.3.5. https://github.com/traveller59/spconv.
[50]
Yan Yan, Yuxing Mao, and Bo Li. 2018. SECOND: Sparsely Embedded Convolutional Detection. Sensors (2018).
[51]
Dongqiangzi Ye, Weijia Chen, Zixiang Zhou, Yufei Xie, Yu Wang, Panqu Wang, and Hassan Foroosh. 2022. LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network. arXiv preprint arXiv:2206.11428 (2022).
[52]
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[53]
Tianwei Yin, Xingyi Zhou, and Philipp Krähenbühl. 2021. Center-based 3D Object Detection and Tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54]
Ziyu Ying, Shulin Zhao, Sandeepa Bhuyan, Cyan Subhra Mishra, Mahmut T Kandemir, and Chita R Das. 2022. Pushing Point Cloud Compression to the Edge. In IEEE/ACM International Symposium on Microarchitecture (MICRO).
[55]
Jie-Fang Zhang and Zhengya Zhang. 2021. Point-X: A Spatial-Locality-Aware Architecture for Energy-Efficient Graph-based Point-Cloud Deep Learning. In IEEE/ACM International Symposium on Microarchitecture.
[56]
Zhekai Zhang*, Hanrui Wang*, Song Han, and William J Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In IEEE International Symposium on High Performance Computer Architecture (HPCA).
[57]
Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, and Gennady Pekhimenko. 2022. DietCode: Automatic Optimization for Dynamic Tensor Programs. In Conference on Machine Learning and Systems (MLSys).
[58]
Zixiang Zhou, Xiangchen Zhao, Yu Wang, Panqu Wang, and Hassan Foroosh. 2022. CenterFormer: Center-based Transformer for 3D Object Detection. In European Conference on Computer Vision (ECCV).
[59]
Chaoyang Zhu, Kejie Huang, Shuyuan Yang, Ziqi Zhu, Hejia Zhang, and Haibin Shen. 2020. An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 9 (2020), 1953–1965.

Cited By

View all
  • (2024)fVDB : A Deep-Learning Framework for Sparse, Large Scale, and High Performance Spatial IntelligenceACM Transactions on Graphics10.1145/365822643:4(1-15)Online publication date: 19-Jul-2024
  • (2024)A Sparse Octree-Based CNN for Probabilistic Occupancy Prediction Applied to Next Best View PlanningIEEE Robotics and Automation Letters10.1109/LRA.2024.34604329:11(9359-9366)Online publication date: Nov-2024
  • (2024)A 28-nm Energy-Efficient Sparse Neural Network Processor for Point Cloud Applications Using Block-Wise Online Neighbor SearchingIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.338687859:9(3070-3081)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2023
    1528 pages
    ISBN:9798400703294
    DOI:10.1145/3613424
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 December 2023

    Check for updates

    Badges

    Author Tags

    1. GPU
    2. graph
    3. high-performance computing
    4. neural network
    5. point cloud
    6. sparse convolution

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    MICRO '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,516
    • Downloads (Last 6 weeks)179
    Reflects downloads up to 02 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)fVDB : A Deep-Learning Framework for Sparse, Large Scale, and High Performance Spatial IntelligenceACM Transactions on Graphics10.1145/365822643:4(1-15)Online publication date: 19-Jul-2024
    • (2024)A Sparse Octree-Based CNN for Probabilistic Occupancy Prediction Applied to Next Best View PlanningIEEE Robotics and Automation Letters10.1109/LRA.2024.34604329:11(9359-9366)Online publication date: Nov-2024
    • (2024)A 28-nm Energy-Efficient Sparse Neural Network Processor for Point Cloud Applications Using Block-Wise Online Neighbor SearchingIEEE Journal of Solid-State Circuits10.1109/JSSC.2024.338687859:9(3070-3081)Online publication date: Sep-2024
    • (2024)LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01850(19563-19572)Online publication date: 16-Jun-2024
    • (2024)One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00960(10072-10083)Online publication date: 16-Jun-2024
    • (2024)G3R: Gradient Guided Generalizable ReconstructionComputer Vision – ECCV 202410.1007/978-3-031-72658-3_18(305-323)Online publication date: 2-Oct-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media