research-article

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism

Authors:

Andrew Lukefahr,

David Palframan,

Reetuparna Das,

Scott MahlkeAuthors Info & Claims

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Pages 548 - 560

https://doi.org/10.1145/3079856.3080215

Published: 24 June 2017 Publication History

Abstract

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network sparsity caused by weight pruning will actually hurt the overall performance despite large reductions in the model size and required multiply-accumulate operations. Also, encoding the sparse format of pruned networks incurs additional storage space overhead. To overcome these challenges, we propose Scalpel that customizes DNN pruning to the underlying hardware by matching the pruned network structure to the data-parallel hardware organization. Scalpel consists of two techniques: SIMD-aware weight pruning and node pruning. For low-parallelism hardware (e.g., microcontroller), SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD units. For high-parallelism hardware (e.g., GPU), node pruning removes redundant nodes, not redundant weights, thereby reducing computation without sacrificing the dense matrix format. For hardware with moderate parallelism (e.g., desktop CPU), SIMD-aware weight pruning and node pruning are synergistically applied together. Across the microcontroller, CPU and GPU, Scalpel achieves mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88%, 82%, and 53%. In comparison, traditional weight pruning achieves mean speedups of 1.90x, 1.06x, 0.41x across the three platforms.

References

[1]

2016. NIVDIA DIGITS DevBox. (2016). https://developer.nvidia.com/devbox.

[2]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 1--13.

Digital Library

[3]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662.

Digital Library

[4]

Guoguo Chen, Carolina Parada, and Georg Heigold. 2014. Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4087--4091.

[5]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. 49. ACM, 269--284.

Digital Library

[6]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).

[7]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and others. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.

Digital Library

[8]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. (2016).

Digital Library

[9]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[10]

Ping Chi, Shuangchen Li, Z Qi, P Gu, C Xu, T Zhang, J Zhao, Y Liu, Y Wang, and Y Xie. 2016. PRIME: A Novel Processing-In-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory. In Proceedings of ISCA, Vol. 43.

Digital Library

[11]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167.

Digital Library

[12]

Matthieu Courbariaux and Yoshua Bengio. 2016. Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).

[13]

Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, and others. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems. 2148--2156.

Digital Library

[14]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92--104.

Digital Library

[15]

Ross Girshick. 2015. Fast R-CNN. In International Conference on Computer Vision (ICCV).

Digital Library

[16]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. arXiv preprint arXiv:1608.04493 (2016).

[17]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. (2015).

[18]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528 (2016).

Digital Library

[19]

Song Han, Huizi Mao, and William J Dally. 2015. A deep neural network compression pipeline: Pruning, quantization, huffman encoding. arXiv preprint arXiv:1510.00149 (2015).

[20]

Song Han, Jeff Pool, John Tran, and William J Dally. 2015. Learning both Weights and Connections for Efficient Neural Networks. arXiv preprint arXiv:1506.02626 (2015).

Digital Library

[21]

Babak Hassibi, David G Stork, and Gregory J Wolff. 1993. Optimal brain surgeon and general network pruning. In Neural Networks, 1993., IEEE International Conference on. IEEE, 293--299.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).

[23]

Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. 2014. Reshaping deep neural network for fast decoding by node-pruning. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 245--249.

[24]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[25]

Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. 2016. Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation. (2016).

[26]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.

Digital Library

[27]

Patrick Judd, Jorge Albericio, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. IEEE Computer Architecture Letters (2016).

[28]

Alex Krizhevsky. 2012. cuda-convnet. (2012). https://code.google.com/p/cuda-convnet/.

[29]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[30]

Andrew Lavin. 2015. Fast algorithms for convolutional neural networks. arXiv preprint arXiv:1509.09308 (2015).

[31]

Yann Le Cun, John S Denker, and Sara A Solla. 1989. Optimal brain damage. In NIPS, Vol. 89.

Digital Library

[32]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[33]

Robert LiKamWa, Yunhui Hou, Julian Gao, Mia Polansky, and Lin Zhong. 2016. RedEye: analog ConvNet image sensor architecture for continuous mobile vision. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 255--266.

Digital Library

[34]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).

[35]

Thomas Miconi. 2016. Neural networks with differentiable structure. arXiv preprint arXiv:1606.06216 (2016).

[36]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267--278.

Digital Library

[37]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In Proc. ISCA.

Digital Library

[38]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[39]

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[40]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.

[41]

Vincent Vanhoucke, Matthieu Devin, and Georg Heigold. 2013. Multiframe deep neural networks for acoustic modeling. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 7582--7585.

[42]

Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. (2011).

[43]

Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast convolutional nets with fbfft: A GPU performance evaluation. arXiv preprint arXiv:1412.7580 (2014).

Cited By

Zhang DWang XWu ZLi GXiao Y(2024)A Multi-Scale Automatic Progressive Pruning Algorithm Based on Deep Neural Network2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661786(8892-8897)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661786
Sun ZHu XWang F(2024)The image classification network pruning strategy based on the TOPSIS methodThird International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024)10.1117/12.3031454(368)Online publication date: 19-Jul-2024
https://doi.org/10.1117/12.3031454
Baishya NManoj BBora P(2024)Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10570946(1-6)Online publication date: 21-Apr-2024
https://doi.org/10.1109/WCNC57260.2024.10570946
Show More Cited By

Index Terms

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Computing methodologies
  1. Machine learning
  2. Parallel computing methodologies

Recommendations

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
ISCA'17

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we ...
COP: customized correlation-based Filter level pruning method for deep CNN compression
Abstract
As deep CNNs get larger, it becomes more challenging to deploy them on resource-restricted mobile devices. Filter-level pruning is one of the most popular methods to compress deep models for mobile deployment. It prunes unimportant filters in the ...
Improve Convolutional Neural Network Pruning by Maximizing Filter Variety
Image Analysis and Processing – ICIAP 2022
Abstract
Neural network pruning is a widely used strategy for reducing model storage and computing requirements. It allows to lower the complexity of the network by introducing sparsity in the weights. Because taking advantage of sparse matrices is still ... $_{}$

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

June 2017

736 pages

ISBN:9781450348928

DOI:10.1145/3079856

ACM SIGARCH Computer Architecture News Volume 45, Issue 2
ISCA'17
May 2017
715 pages
ISSN:0163-5964
DOI:10.1145/3140659
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

IEEE: IEEE Computer Society Technical Committee on Design Automation
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ISCA '17

Sponsor:

IEEE
SIGARCH

ISCA '17: The 44th Annual International Symposium on Computer Architecture

June 24 - 28, 2017

ON, Toronto, Canada

Acceptance Rates

ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

298
Total Citations
View Citations
2,731
Total Downloads

Downloads (Last 12 months)109
Downloads (Last 6 weeks)18

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang DWang XWu ZLi GXiao Y(2024)A Multi-Scale Automatic Progressive Pruning Algorithm Based on Deep Neural Network2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661786(8892-8897)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661786
Sun ZHu XWang F(2024)The image classification network pruning strategy based on the TOPSIS methodThird International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024)10.1117/12.3031454(368)Online publication date: 19-Jul-2024
https://doi.org/10.1117/12.3031454
Baishya NManoj BBora P(2024)Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10570946(1-6)Online publication date: 21-Apr-2024
https://doi.org/10.1109/WCNC57260.2024.10570946
Jeon YChoi HJeong HJung DPack S(2024)CheckBullet: A Lightweight Checkpointing System for Robust Model Training on Mobile NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.345028323:12(14946-14958)Online publication date: Dec-2024
https://doi.org/10.1109/TMC.2024.3450283
Motetti BRisso MBurrello AMacii EPoncino MPagliari D(2024)Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural NetworksIEEE Transactions on Computers10.1109/TC.2024.344908473:11(2619-2633)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3449084
Li WHacid HAlmazrouei EDebbah M(2023)A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and TechniquesAI10.3390/ai40300394:3(729-786)Online publication date: 13-Sep-2023
https://doi.org/10.3390/ai4030039
Susskind ZArora AMiranda IBacellar AVillon LKatopodis Rde Araújo LDutra DLima PFrança FBreternitz Jr. MJohn L(2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629522
Venieris SFernandez-Marques JLane N(2023)Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights GenerationACM Transactions on Design Automation of Electronic Systems10.1145/361167328:6(1-31)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3611673
Lin SManerkar YLohstroh MPolgreen EYu SJerad CLee ESeshia S(2023)Towards Building Verifiable CPS using Lingua FrancaACM Transactions on Embedded Computing Systems10.1145/360913422:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609134
Thilakasiri TBecker M(2023)Methods to Realize Preemption in Phased Execution ModelsACM Transactions on Embedded Computing Systems10.1145/360913222:5s(1-25)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609132
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents