keynote

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Authors:

Song HanAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 27, Issue 3

Article No.: 20, Pages 1 - 50

https://doi.org/10.1145/3486618

Published: 04 March 2022 Publication History

Abstract

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This article provides an overview of efficient deep learning methods, systems, and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation.

[2]

Mohsen Ahmadzadeh, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2021. A2P-MANN: Adaptive attention inference hops pruned memory-augmented neural networks. arXiv preprint arXiv:2101.09693 (2021).

[3]

Nader Akoury, Kalpesh Krishna, and Mohit Iyyer. 2019. Syntactically supervised transformers for faster neural machine translation. In Conference of the Association for Computational Linguistics.

[4]

Jorge Albericio, Patrick Judd, Tayler H. Hetherington, Tor M. Aamodt, Natalie D. Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In International Symposium on Computer Architecture.

[5]

Alexander Amini, Guy Rosman, Sertac Karaman, and Daniela Rus. 2019. Variational end-to-end navigation and localization. In IEEE International Conference on Robotics and Automation.

Digital Library

[6]

Kota Ando, Kodai Ueyoshi, Kentaro Orimo, Haruyoshi Yonekawa, Shimpei Sato, Hiroki Nakahara, Masayuki Ikebe, Tetsuya Asai, Shinya Takamaeda-Yamazaki, Tadahiro Kuroda, and Masato Motomur. 2017. BRein memory: A 13-layer 4.2 K Neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS. In Symposium on VLSI Circuits.

[7]

Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In IEEE Computer Society Annual Symposium on VLSI.

[8]

Iro Armeni, Alexandar Sax, Amir R. Zamir, and Silvio Savarese. 2017. Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017).

[9]

Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In IEEE Conference on Computer Vision and Pattern Recognition.

[10]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.

[11]

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, and Irwin King. 2020. BinaryBERT: Pushing the limit of BERT quantization. In Conference of the Association for Computational Linguistics.

[12]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2017. Designing neural network architectures using reinforcement learning. In International Conference on Learning Representations.

[13]

Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakkar, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul N. Whatmough. 2020. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. In Conference on Machine Learning and Systems.

[14]

Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Conference on Neural Information Processing Systems.

[15]

Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Conference on Empirical Methods in Natural Language Processing.

[16]

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Juergen Gall. 2019. SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In International Conference on Computer Vision.

[17]

Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).

[18]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).

[19]

Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic image networks for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[20]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).

[21]

Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. 2018. SMASH: One-shot model architecture search through HyperNetworks. In International Conference on Learning Representations.

[22]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Conference on Neural Information Processing Systems.

[23]

Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In International Conference on Knowledge Discovery and Data Mining.

Digital Library

[24]

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient architecture search by network transformation. In AAAI Conference on Artificial Intelligence.

[25]

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations.

[26]

Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Conference on Neural Information Processing Systems.

[27]

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Kuan Wang, Tianzhe Wang, Ligeng Zhu, and Song Han. 2019. AutoML for architecting efficient and specialized neural networks. IEEE Micro 40, 1 (2019), 75–82.

[28]

Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. 2018. Path-level network transformation for efficient architecture search. In International Conference on Machine Learning.

[29]

Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations.

[30]

Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. In IEEE Conference on Computer Vision and Pattern Recognition.

[31]

Sebastian Caldas, Jakub Konečny, H. Brendan McMahan, and Ameet Talwalkar. 2018. Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210 (2018).

[32]

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, and Niranjan Balasubramanian. 2020. DeFormer: Decomposing pre-trained transformers for faster question answering. In Conference of the Association for Computational Linguistics.

[33]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition.

[34]

Lukas Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, and Luca Benini. 2015. Origami: A convolutional network accelerator. In Great Lakes Symposium on VLSI.

[35]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In International Symposium on Computer Architecture.

[36]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).

[37]

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference.

[38]

Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, and Jingren Zhou. 2020. AdaBERT: Task-adaptive BERT compression with differentiable neural architecture search. In International Joint Conference on Artificial Intelligence.

[39]

Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Conference on Neural Information Processing Systems.

[40]

Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Conference on Neural Information Processing Systems.

[41]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[42]

Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).

[43]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In USENIX Symposium on Operating Systems Design and Implementation.

[44]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).

[45]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In International Symposium on Microarchitecture.

[46]

Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. 2019. DetNAS: Backbone search for object detection. In Conference on Neural Information Processing Systems.

[47]

Yiren Chen, Yaming Yang, Hong Sun, Yujing Wang, Yu Xu, Wei Shen, Rong Zhou, Yunhai Tong, Jing Bai, and Ruofei Zhang. 2020. AutoADR: Automatic model design for ad relevance. In International Conference on Information & Knowledge Management.

Digital Library

[48]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. Int. J. Space-based Situat. Comput. 52, 1 (2017), 127–138.

[49]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Select. Topics Circ. Syst. 9, 2 (2019), 292–308.

[50]

Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. IEEE Sig. Process. Mag. 35, 1 (2017), 126–136.

[51]

Robin Cheong and Robel Daniel. 2019. transformers.zip: Compressing Transformers with Pruning and Quantization. Technical Report. Stanford University, Stanford, CA.

[52]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[53]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).

[54]

Tejalal Choudhary, Vipul Mishra, Anurag Goswami, and Jagannathan Sarangapani. 2020. A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. (2020).

Digital Library

[55]

Christopher Choy, Wei Dong, and Vladlen Koltun. 2020. Deep global registration. In IEEE Conference on Computer Vision and Pattern Recognition.

[56]

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[57]

Christopher Choy, Jaesik Park, and Vladlen Koltun. 2019. Fully convolutional geometric features. In International Conference on Computer Vision.

[58]

Ozgun Cicek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 2016. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-assisted Intervention.

[59]

Felipe Codevilla, Matthias Miiller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. 2018. End-to-end driving via conditional imitation learning. In IEEE International Conference on Robotics and Automation.

Digital Library

[60]

Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs. In IEEE Symposium on Field-programmable Custom Computing Machines.

[61]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. In Conference on Neural Information Processing Systems.

[62]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or –1. arXiv preprint arXiv:1602.02830 (2016).

[63]

Yin Cui, Yang Song, Chen Sun, Andrew Howard, and Serge Belongie. 2018. Large scale fine-grained categorization and domain-specific transfer learning. In IEEE Conference on Computer Vision and Pattern Recognition.

[64]

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition.

[65]

Pengcheng Dai, Jianlei Yang, Xucheng Ye, Xingzhou Cheng, Junyu Luo, Linghao Song, Yiran Chen, and Weisheng Zhao. 2020. SparseTrain: Exploiting dataflow sparsity for efficient convolutional neural networks training. In Design Automation Conference.

[66]

William J. Dally, Yatish Turakhia, and Song Han. 2020. Domain-specific hardware accelerators. Commun. ACM 63, 7 (2020).

Digital Library

[67]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition.

[68]

Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE (2020).

[69]

Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero. 2013. Recent advances in deep learning for speech research at Microsoft. In IEEE International Conference on Acoustics, Speech and Signal Processing.

[70]

Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Conference on Neural Information Processing Systems.

[71]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics.

[72]

Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, and Song Han. 2020. IOS: Inter-operator scheduler for CNN acceleration. arXiv preprint arXiv:2011.01302 (2020).

[73]

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning.

[74]

Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. HAWQ: Hessian AWare quantization of neural networks with mixed-precision. In International Conference on Computer Vision.

[75]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In International Symposium on Computer Architecture.

[76]

Yousef Elkurdi, David Fernández, Evgueni Souleimanov, Dennis Giannacopoulos, and Warren J. Gross. 2008. FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method. Comput. Phys. Commun. 178, 8 (2008), 558–570.

[77]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2018. Efficient multi-objective neural architecture search via Lamarckian evolution. In International Conference on Learning Representations.

[78]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 55 (2019), 1–21.

[79]

Andries Petrus Engelbrecht. 2001. A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans. Neural Netw. 12, 6 (2001), 1386–1399.

Digital Library

[80]

Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, and David Cox. 2019. More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation. In Conference on Neural Information Processing Systems.

[81]

Igor Fedorov, Ryan P. Adams, Matthew Mattina, and Paul N. Whatmough. 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. In Conference on Neural Information Processing Systems.

[82]

William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. arXiv preprint arXiv:2101.03961 (2021).

[83]

Christoph Feichtenhofer. 2020. X3D: Expanding architectures for efficient video recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[84]

Christoph Feichtenhofer, Axel Pinz, and Richard Wildes. 2016. Spatiotemporal residual networks for video action recognition. In Conference on Neural Information Processing Systems.

[85]

Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[86]

Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.

[87]

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. 2020. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning.

[88]

Jonathan Frankle, David J. Schwab, and Ari S. Morcos. 2020. Training BatchNorm and only BatchNorm: On the expressive power of random features in CNNs. arXiv preprint arXiv:2003.00152 (2020).

[89]

Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, and Alex G. Hauptmann. 2015. DevNet: A deep event network for multimedia event detection and evidence recounting. In IEEE Conference on Computer Vision and Pattern Recognition.

[90]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[91]

Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In IEEE Conference on Computer Vision and Pattern Recognition.

[92]

C. Lee Giles and Christian W. Omlin. 1994. Pruning recurrent neural networks for improved generalization performance. IEEE Trans. Neural Netw. 5, 5 (1994), 848–851.

Digital Library

[93]

Ross Girshick. 2015. Fast R-CNN. In International Conference on Computer Vision.

Digital Library

[94]

Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations. Stanford University.

[95]

Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).

[96]

Mitchell A. Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing BERT: Studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020).

[97]

Saurabh Goyal et al. 2020. PoWER-BERT: Accelerating BERT inference for classification tasks. In International Conference on Machine Learning.

[98]

Benjamin Graham. 2015. Sparse 3D convolutional neural networks. In British Machine Vision Conference.

[99]

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D semantic segmentation with submanifold sparse convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[100]

Paul Grigoraş, Pavel Burovskiy, Wayne Luk, and Spencer Sherwin. 2016. Optimising sparse matrix vector multiplication for large scale FEM problems on FPGA. In International Conference on Field-programmable Logic and Applications.

[101]

Audrunas Gruslys, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. Memory-efficient backpropagation through time. In Conference on Neural Information Processing Systems.

[102]

Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, and Richard Socher. 2018. Non-autoregressive neural machine translation. In International Conference on Learning Representations.

[103]

Jiatao Gu, Changhan Wang, and Jake Zhao. 2019. Levenshtein transformer. In Conference on Neural Information Processing Systems.

[104]

Haoqiang Guo, Lu Peng, Jian Zhang, Qing Chen, and Travis D. LeCompte. 2020. ATT: A fault-tolerant ReRAM accelerator for attention-based neural networks. In International Conference on Computer Design.

[105]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient DNNs. In Conference on Neural Information Processing Systems.

[106]

Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single path one-shot neural architecture search with uniform sampling. In European Conference on Computer Vision.

Digital Library

[107]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning.

[108]

Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W. Lee, and Deog-Kyoon Jeong. 2020. A3: Accelerating attention mechanisms in neural networks with approximation. In IEEE International Symposium on High-performance Computer Architecture.

[109]

Dongyoon Han, Jiwhan Kim, and Junmo Kim. 2017. Deep pyramidal residual networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[110]

Lei Han, Tian Zheng, Lan Xu, and Lu Fang. 2020. OccuSeg: Occupancy-aware 3D instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[111]

Song Han. 2017. Efficient Methods and Hardware for Deep Learning. Ph.D. Dissertation. Stanford University.

Digital Library

[112]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In ACM/SIGDA International Symposium on Field-programmable Gate Arrays.

[113]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In International Symposium on Computer Architecture.

[114]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations.

[115]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Conference on Neural Information Processing Systems.

[116]

Albert Haque, Michelle Guo, Alexandre Alahi, Serena Yeung, Zelun Luo, Alisha Rege, Jeffrey Jopling, Lance Downing, William Beninati, Amit Singh, Terry Platchek, Arnold Milstein, and Li Fei-Fei. 2017. Towards vision-based smart hospitals: A system for tracking and monitoring hand hygiene compliance. In Machine Learning for Healthcare Conference.

[117]

Babak Hassibi and David G. Stork. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Conference on Neural Information Processing Systems.

[118]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[119]

Xin He, Kaiyong Zhao, and Xiaowen Chu. 2021. AutoML: A survey of the state-of-the-art. Knowl.-based Syst. 212 (2021), 106622.

[120]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In European Conference on Computer Vision.

[121]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In International Conference on Computer Vision.

[122]

Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Sig. Process. Mag. 29, 6 (2012), 82–97.

[123]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[124]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In International Conference on Computer Vision.

[125]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dimitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[126]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In Conference on Neural Information Processing Systems.

[127]

Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. 2020. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In IEEE Conference on Computer Vision and Pattern Recognition.

[128]

Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Multi-scale dense networks for resource efficient image classification. In International Conference on Learning Representations.

[129]

Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent slice networks for 3D segmentation on point clouds. In IEEE Conference on Computer Vision and Pattern Recognition.

[130]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50 \(\times\) fewer parameters and \(\lt\) 0.5MB model size. arXiv preprint arXiv:1602.07360 (2016).

[131]

Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on Android smartphones. arXiv preprint arXiv:1810.01109 (2018).

[132]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning.

Digital Library

[133]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE Conference on Computer Vision and Pattern Recognition.

[134]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In British Machine Vision Conference.

[135]

Ernest Jamro, Tomasz Pabiś, Paweł Russek, and Kazimierz Wiatr. 2015. The algorithms for FPGA implementation of sparse matrices multiplication. Comput. Inform. 33, 3 (2015), 667–684.

[136]

Hanhwi Jang, Joonsung Kim, Jae-Eon Jo, Jaewon Lee, and Jangwoo Kim. 2019. MnnFast: A fast and scalable system architecture for memory-augmented neural networks. In International Symposium on Computer Architecture.

[137]

Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: Optimizing deep learning computation with automatic generation of graph substitutions. In ACM Symposium on Operating Systems Principles.

[138]

Zhihao Jia, Matei Zaharia, and Alex Aiken. 2018. Beyond data and model parallelism for deep neural networks. In International Conference on Machine Learning.

[139]

Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, and Jiaya Jia. 2020. PointGroup: Dual-set point grouping for 3D instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[140]

Weiwen Jiang, Lei Yang, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Shouzhen Gu, Sakyasingha Dasgupta, Yiyu Shi, and Jingtong Hu. 2020. Hardware/software co-exploration of neural architectures. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 39, 12 (2020), 4805–4815.

[141]

Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, and Zhihua Wu. 2020. MNN: A universal and efficient inference engine. In Conference on Machine Learning and Systems.

[142]

Arthur Jochems, Timo M. Deist, Issam El Naqa, Marc Kessler, Chuck Mayo, Jackson Reeves, Shruti Jolly, Martha Matuszak, Randall Ten Haken, Johan van Soest, Cary Oberije, Corinne Faivre-Finn, Gareth Price, Dirk de Ruysscher, Philippe Lambin, and Andre Dekker. 2017. Developing and validating a survival prediction model for NSCLC patients through distributed learning across 3 countries. Int. J. Radiat. Oncol., Biol. Phys. 99, 2 (2017), 344–352.

[143]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In IEEE/ACM International Symposium on Microarchitecture.

[144]

Sheng-Chun Kao, Geonhwa Jeong, and Tushar Krishna. 2020. ConfuciuX: Autonomous hardware resource assignment for DNN accelerators using reinforcement learning. In IEEE/ACM International Symposium on Microarchitecture.

[145]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[146]

Gyuwan Kim and Kyunghyun Cho. 2020. Length-adaptive transformer: Train once with length drop, use anytime with search. arXiv preprint arXiv:2010.07003 (2020).

[147]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. I-BERT: Integer-only BERT quantization. arXiv preprint arXiv:2101.01321 (2021).

[148]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Conference on Empirical Methods in Natural Language Processing.

[149]

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).

[150]

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2019. Reformer: The efficient transformer. In International Conference on Learning Representations.

[151]

Jakub Konečnỳ, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).

[152]

Simon Kornblith, Jonathon Shlens, and Quoc V. Le. 2019. Do better ImageNet models transfer better? In IEEE Conference on Computer Vision and Pattern Recognition.

[153]

Jean-Francois Lafleche, Clement Fuji Tsang, Artem Rozantsev, Wenzheng Chen, Tommy Xiang, Rev Lebaredian, Sanja Fidler, Krishna Murthy Jatavallabhula, and Edward Smith. 2019. Kaolin: A Pytorch library for accelerating 3D deep learning research. arXiv preprint arXiv:1911.05063 (2019).

[154]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.

[155]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Conference on Neural Information Processing Systems.

Digital Library

[156]

Jason Kuen, Xiangfei Kong, Zhe Lin, Gang Wang, Jianxiong Yin, Simon See, and Yap-Peng Tan. 2018. Stochastic downsampling for cost-adjustable inference and improved regularization in convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[157]

Jean Lahoud, Bernard Ghanem, Marc Pollefeys, and Martin R. Oswald. 2019. 3D instance segmentation via multi-task metric learning. In International Conference on Computer Vision.

[158]

Shiyi Lan, Ruichi Yu, Gang Yu, and Larry S. Davis. 2019. Modeling local geometric structure of 3D point clouds using geo-CNN. In IEEE Conference on Computer Vision and Pattern Recognition.

[159]

Loic Landrieu and Martin Simonovsky. 2018. Large-scale point cloud semantic segmentation with superpoint graphs. In IEEE Conference on Computer Vision and Pattern Recognition.

[160]

Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv preprint arXiv:1412.6553 (2014).

[161]

Vadim Lebedev and Victor Lempitsky. 2016. Fast ConvNets using group-wise brain damage. In IEEE Conference on Computer Vision and Pattern Recognition.

[162]

Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 2010. MNIST handwritten digit database. AT&T Labs. Retrieved from:http://yann.lecun.com/exdb/mnist.

[163]

Yann LeCun, John S. Denker, Sara A. Solla, Richard E. Howard, and Lawrence D. Jackel. 1989. Optimal brain damage. In Conference on Neural Information Processing Systems.

[164]

Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2018. UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In International Solid-state Circuits Conference.

[165]

Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, and Yee Whye Teh. 2019. Set transformer: A framework for attention-based permutation-invariant neural networks. In International Conference on Machine Learning.

[166]

Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, and Yoav Artzi. 2017. Simple recurrent units for highly parallelizable recurrence. In Conference on Empirical Methods in Natural Language Processing.

[167]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).

[168]

Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, and Bernard Ghanem. 2021. DeepGCNs: Making GCNs go as deep as CNNs. IEEE Trans. Pattern Anal. Mach. Intell. (2021).

[169]

Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2020. SGAS: Sequential greedy architecture search. In IEEE Conference on Computer Vision and Pattern Recognition.

[170]

Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, and Bernard Ghanem. 2020. LC-NAS: Latency constrained neural architecture search for point cloud networks. arXiv preprint arXiv:2008.10309 (2020).

[171]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient ConvNets. In International Conference on Learning Representations.

[172]

Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, and Song Han. 2020. GAN compression: Efficient architectures for interactive conditional GANs. In IEEE Conference on Computer Vision and Pattern Recognition.

[173]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: Convolution on \(\mathcal {X}\) -transformed points. In Conference on Neural Information Processing Systems.

[174]

Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joey Gonzalez. 2020. Train big, then compress: Rethinking model size for efficient training and inference of transformers. In International Conference on Machine Learning.

[175]

Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. arXiv preprint arXiv:2007.10319 (2020).

[176]

Ji Lin, Chuang Gan, and Song Han. 2019. Training kinetics in 15 minutes: Large-scale distributed training on videos. arXiv preprint arXiv:1910.00932 (2019).

[177]

Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal shift module for efficient video understanding. In International Conference on Computer Vision.

[178]

Ji Lin, Yongming Rao, and Jiwen Lu. 2017. Runtime neural pruning. In Conference on Neural Information Processing Systems.

[179]

Yujun Lin, Driss Hafdi, Kuan Wang, Zhijian Liu, and Song Han. 2020. Neural-hardware architecture search. In Workshop on ML for Systems at NeurIPS.

[180]

Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J. Dally. 2018. Deep gradient compression: Reducing the communication bandwidth for distributed training. In International Conference on Learning Representations.

[181]

Yujun Lin, Mengtian Yang, and Song Han. 2021. NAAS: Neural accelerator architecture search. In Design Automation Conference.

Digital Library

[182]

Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2016. Neural networks with few multiplications. In International Conference on Learning Representations.

[183]

Bingbin Liu, Michelle Guo, Edward Chou, Rishab Mehra, Serena Yeung, N. Lance Downing, Francesca Salipur, Jeffrey Jopling, Brandi Campbell, Kayla Deru, William Beninati, Arnold Milstein, and Li Fei-Fei. 2018. 3D point cloud-based visual prediction of ICU mobility care activities. In Machine Learning for Healthcare Conference.

[184]

Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L. Yuille, and Li Fei-Fei. 2019. Auto-DeepLab: Hierarchical neural architecture search for semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[185]

Chenxi Liu, Barret Zoph, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In European Conference on Computer Vision.

Digital Library

[186]

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2018. Hierarchical representations for efficient architecture search. In International Conference on Learning Representations.

[187]

Haoxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In International Conference on Learning Representations.

[188]

Lanlan Liu and Jia Deng. 2018. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. In AAAI Conference on Artificial Intelligence.

[189]

Liu Liu, Lei Deng, Xing Hu, Maohua Zhu, Guoqi Li, Yufei Ding, and Yuan Xie. 2019. Dynamic sparse graph for efficient deep learning. In International Conference on Learning Representations.

[190]

Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating Wikipedia by summarizing long sequences. In International Conference on Learning Representations.

[191]

Shujie Liu, Nan Yang, Mu Li, and Ming Zhou. 2015. A recursive recurrent neural network for statistical machine translation. In Conference of the Association for Computational Linguistics.

[192]

Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[193]

Zhijian Liu. 2020. Hardware-efficient Deep Learning for 3D Point Cloud. Master’s. Thesis Massachusetts Institute of Technology.

[194]

Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, and Daniela Rus. 2021. Efficient and robust LiDAR-Based end-to-end navigation. In IEEE International Conference on Robotics and Automation.

Digital Library

[195]

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In International Conference on Computer Vision.

[196]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. 2019. MetaPruning: Meta learning for automatic neural network channel pruning. In International Conference on Computer Vision.

[197]

Zhijian Liu, Haotian Tang, Yujun Lin, and Song Han. 2019. Point-voxel CNN for efficient 3D deep learning. In Conference on Neural Information Processing Systems.

[198]

Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, and Song Han. 2021. PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[199]

Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, and Song Han. 2020. DataMix: Efficient privacy-preserving edge-cloud inference. In European Conference on Computer Vision.

Digital Library

[200]

Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, and Zhongfeng Wang. 2020. Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In IEEE International System-on-Chip Conference.

[201]

Chenxu Luo and Alan L. Yuille. 2019. Grouped spatial-temporal aggregation for efficient action recognition. In International Conference on Computer Vision.

[202]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In European Conference on Computer Vision.

[203]

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2020. PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In AAAI Conference on Artificial Intelligence.

[204]

Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. 2013. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013).

[205]

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Mohammad Alizadeh. 2019. Park: An open platform for learning-augmented computer systems. In Conference on Neural Information Processing Systems.

[206]

Jiageng Mao, Xiaogang Wang, and Hongsheng Li. 2019. Interpolated convolutional networks for 3D point cloud understanding. In International Conference on Computer Vision.

[207]

Daniel Maturana and Sebastian Scherer. 2015. VoxNet: A 3D convolutional neural network for real-time object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems.

Digital Library

[208]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2016. Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics.

[209]

Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? In Conference on Neural Information Processing Systems.

[210]

Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Conference of the International Speech Communication Association.

[211]

Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, and Jeff Dean. 2018. A hierarchical model for device placement. In International Conference on Learning Representations.

[212]

Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. In International Conference on Machine Learning.

[213]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient transfer learning. In International Conference on Learning Representations.

[214]

Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, and Andrew Howard. 2019. K for the price of 1: Parameter-efficient multi-task and transfer learning. In International Conference on Learning Representations.

[215]

Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-free quantization through weight equalization and bias correction. In International Conference on Computer Vision.

[216]

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[217]

Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco G. B. De Natale. 2017. Deep learning for mobile multimedia: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 35 (2017), 1–22.

[218]

Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In IEEE International Symposium on High-performance Computer Architecture.

[219]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In International Symposium on Computer Architecture.

[220]

Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with direct sparse convolutions and guided pruning. arXiv preprint arXiv:1608.01409 (2016).

[221]

Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, and Hoi-Jun Yoo. 2015. A 1.93TOPS/W Scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In IEEE International Solid-state Circuits Conference.

[222]

Seongsik Park, Jaehee Jang, Seijoon Kim, Byunggook Na, and Sungroh Yoon. 2020. Memory-augmented neural networks on FPGA for real-time and energy-efficient question answering. IEEE Trans. Very Large Scale Integ. Syst. (2020).

[223]

Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International Conference on Machine Learning.

[224]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Conference on Neural Information Processing Systems.

[225]

Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In IEEE International Conference on Computer Design.

[226]

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. 2018. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning.

[227]

A. J. Piergiovanni, Anelia Angelova, and Michael S. Ryoo. 2019. Tiny video networks. arXiv preprint arXiv:1910.06961 (2019).

[228]

A. J. Piergiovanni, Anelia Angelova, Alexander Toshev, and Michael S. Ryoo. 2019. Evolving space-time neural architectures for videos. In International Conference on Computer Vision.

[229]

Charles R. Qi, Or Litany, Kaiming He, and Leonidas J. Guibas. 2019. Deep Hough voting for 3D object detection in point clouds. In International Conference on Computer Vision.

[230]

Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum PointNets for 3D object detection from RGB-D data. In IEEE Conference on Computer Vision and Pattern Recognition.

[231]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[232]

Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. In IEEE Conference on Computer Vision and Pattern Recognition.

[233]

Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Conference on Neural Information Processing Systems.

[234]

Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In IEEE International Symposium on High-performance Computer Architecture.

[235]

Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, and Jie Tang. 2019. Blockwise self-attention for long document understanding. arXiv preprint arXiv:1911.02972 (2019).

[236]

Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In International Conference on Computer Vision.

[237]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. (2019).

[238]

Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. In IEEE Conference on Computer Vision and Pattern Recognition.

[239]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Conference on Empirical Methods in Natural Language Processing.

[240]

Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).

[241]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In European Conference on Computer Vision.

[242]

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. 2020. Accelerating 3D deep learning with PyTorch3D. arXiv preprint arXiv:2007.08501 (2020).

[243]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In AAAI Conference on Artificial Intelligence.

[244]

Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning deep 3D representations at high resolutions. In IEEE Conference on Computer Vision and Pattern Recognition.

[245]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).

[246]

Aurko Roy, Mohammad Saffar, Ashish Vaswani, and David Grangier. 2020. Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Ling. 9, 3 (2020), 53–68.

[247]

Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. 2020. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. arXiv preprint arXiv:2008.05124 (2020).

[248]

Michael S. Ryoo, A. J. Piergiovanni, Mingxing Tan, and Anelia Angelova. 2020. AssembleNet: Searching for multi-stream neural connectivity in video architectures. In International Conference on Learning Representations.

[249]

Tayyar Rzayev, Saber Moradi, David H. Albonesi, and Rajit Manchar. 2017. DeepRecon: Dynamically reconfigurable architecture for accelerating deep neural networks. In International Joint Conference on Neural Networks.

[250]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition.

[251]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Conference on Neural Information Processing Systems.

[252]

Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In International Conference on Application-specific Systems, Architectures and Processors.

Digital Library

[253]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and application to data-parallel distributed training of speech DNNs. In Conference of the International Speech Communication Association.

[254]

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. arXiv preprint arXiv:1403.6382 (2014).

[255]

Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Design Automation Conference.

Digital Library

[256]

Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In International Symposium on Computer Architecture.

[257]

Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, and Douwe Kiela. 2021. Reservoir transformer. In Conference of the Association for Computational Linguistics.

[258]

Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. Q-BERT: Hessian based ultra low precision quantization of BERT. In AAAI Conference on Artificial Intelligence.

[259]

Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In IEEE Conference on Computer Vision and Pattern Recognition.

[260]

Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D object proposal generation and detection from point cloud. In IEEE Conference on Computer Vision and Pattern Recognition.

[261]

Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43, 8 (2020), 2647–2664.

[262]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Conference on Neural Information Processing Systems.

[263]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.

[264]

David R. So, Chen Liang, and Quoc V. Le. 2019. The evolved transformer. In International Conference on Machine Learning.

[265]

Suraj Srinivas and R. Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. In British Machine Vision Conference.

[266]

Vinay Sriram, David Cox, Kuen Hung Tsoi, and Wayne Luk. 2010. Towards an embedded biologically-inspired machine vision processor. In International Conference on Field-programmable Technology.

[267]

Jonathan Stroud, David Ross, Chen Sun, Jia Deng, and Rahul Sukthankar. 2020. D3D: Distilled 3D networks for video action recognition. In IEEE/CVF Winter Conference on Applications of Computer Vision.

[268]

Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and policy considerations for deep learning in NLP. In Conference of the Association for Computational Linguistics.

[269]

Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. 2020. Gate-shift networks for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[270]

Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, and Yonggang Wen. 2019. Optimizing network performance for distributed DNN training on GPU clusters: ImageNet/AlexNet training in 1.5 minutes. arXiv preprint arXiv:1902.06855 (2019).

[271]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Conference on Neural Information Processing Systems.

[272]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.

[273]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition.

[274]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition.

[275]

Thierry Tambe, Coleman Hooper, Lillian Pentecost, En-Yu Yang, Marco Donato, Victor Sanh, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2020. EdgeBERT: Optimizing on-chip inference for multi-task NLP. arXiv preprint arXiv:2011.14203 (2020).

[276]

Thierry Tambe, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. Algorithm-hardware co-design of adaptive floating-point encodings for resilient deep learning inference. In Design Automation Conference.

[277]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In IEEE Conference on Computer Vision and Pattern Recognition.

[278]

Mingxing Tan, Ruoming Pang, and Quoc V. Le. 2020. EfficientDet: Scalable and efficient object detection. In IEEE Conference on Computer Vision and Pattern Recognition.

[279]

Zhanhong Tan, Jiebo Song, Xiaolong Ma, Sia-Huat Tan, Hongyang Chen, Yuanqing Miao, Yifu Wu, Shaokai Ye, Yanzhi Wang, Dehui Li, and Kaisheng Ma. 2020. PCNN: Pattern-based fine-grained regular pruning towards optimizing CNN accelerators. arXiv preprint arXiv:2002.04997 (2020).

[280]

Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. 2020. Searching efficient 3D architectures with sparse point-voxel convolution. In European Conference on Computer Vision.

Digital Library

[281]

Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, and Qian-Yi Zhou. 2018. Tangent convolutions for dense prediction in 3D. In IEEE Conference on Computer Vision and Pattern Recognition.

[282]

Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv preprint arXiv:2009.06732 (2020).

[283]

Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, and Silvio Savarese. 2017. SEGCloud: Semantic segmentation of 3D point clouds. In International Conference on 3D Vision.

[284]

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas J. Guibas. 2019. KPConv: Flexible and deformable convolution for point clouds. In International Conference on Computer Vision.

[285]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In International Conference on Computer Vision.

Digital Library

[286]

Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. 2019. Video classification with channel-separated convolutional networks. In International Conference on Computer Vision.

[287]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[288]

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In International Symposium on Field-programmable Gate Arrays.

[289]

Yaman Umuroglu, Lahiru Rasnayake, and Magnus Själander. 2018. BISMO: A scalable bit-serial matrix multiplication overlay for reconfigurable computing. In International Conference on Field-programmable Logic and Applications.

[290]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Conference on Neural Information Processing Systems.

[291]

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Conference of the Association for Computational Linguistics.

[292]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations.

[293]

Hanrui Wang. 2020. Efficient Algorithms and Hardware for Natural Language Processing. Master’s. Thesis Massachusetts Institute of Technology.

[294]

Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, and Song Han. 2020. GCN-RL circuit designer: Transferable transistor sizing with graph neural networks and reinforcement learning. In Design Automation Conference.

[295]

Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and Song Han. 2020. HAT: Hardware-aware transformers for efficient natural language processing. In Conference of the Association for Computational Linguistics.

[296]

Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, and Song Han. 2018. Learning to design circuits. In Workshop on ML for Systems at NeurIPS.

[297]

Hanrui Wang, Zhekai Zhang, and Song Han. 2021. SpAtten: Efficient sparse attention architecture with cascade token and head pruning. In IEEE International Symposium on High-performance Computer Architecture.

[298]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In IEEE Conference on Computer Vision and Pattern Recognition.

[299]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2020. Hardware-centric AutoML for mixed-precision quantization. Int. J. Comput. Vis. 128, 8 (2020), 2035–2048.

Digital Library

[300]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision.

[301]

Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. In SIGGRAPH Conference.

Digital Library

[302]

Peng-Shuai Wang, Chun-Yu Sun, Yang Liu, and Xin Tong. 2018. Adaptive O-CNN: A patch-based deep representation of 3D shapes. In SIGGRAPH Asia Conference.

Digital Library

[303]

Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019. Learning deep transformer models for machine translation. In Conference of the Association for Computational Linguistics.

[304]

Sinong Wang, Belinda Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).

[305]

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, and Song Han. 2020. APQ: Joint search for network architecture, pruning and quantization policy. In IEEE Conference on Computer Vision and Pattern Recognition.

[306]

Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. 2018. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.

[307]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018. SkipNet: Learning dynamic routing in convolutional networks. In European Conference on Computer Vision.

[308]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph CNN for learning on point clouds. In SIGGRAPH Conference.

Digital Library

[309]

Ziheng Wang. 2021. SparseDNN: Fast sparse deep learning inference on CPUs. arXiv preprint arXiv:2101.07948 (2021).

[310]

Zihao Wang, Chen Lin, Lu Sheng, Junjie Yan, and Jing Shao. 2020. PV-NAS: Practical neural architecture search for video recognition. arXiv preprint arXiv:2011.00826 (2020).

[311]

Zongji Wang and Feng Lu. 2019. VoxSegNet: Volumetric CNNs for semantic part segmentation of 3D shapes. IEEE Transactions on Visualization and Computer Graphics 26, 9 (2019), 2919–2930.

[312]

Jianqiao Wangni, Jialei Wang, Ji Liu, and Tong Zhang. 2018. Gradient sparsification for communication-efficient distributed optimization. In Conference on Neural Information Processing Systems.

[313]

Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, and Xu Sun. 2019. Imitation learning for non-autoregressive neural machine translation. In Conference of the Association for Computational Linguistics.

[314]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Conference on Neural Information Processing Systems.

[315]

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2017. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Conference on Neural Information Processing Systems.

[316]

Martin Wistuba, Ambrish Rawat, and Tejaswini Pedapati. 2019. A survey on neural architecture search. arXiv preprint arXiv:1905.01392 (2019).

[317]

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition.

[318]

Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, and Michael Auli. 2019. Pay less attention with lightweight and dynamic convolutions. In International Conference on Learning Representations.

[319]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized convolutional neural networks for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition.

[320]

Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. PointConv: Deep convolutional networks on 3D point clouds. In IEEE Conference on Computer Vision and Pattern Recognition.

[321]

Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, and Song Han. 2020. Lite transformer with long-short range attention. In International Conference on Learning Representations.

[322]

Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, and Rogerio Feris. 2018. BlockDrop: Dynamic inference paths in residual networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[323]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition.

[324]

Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, and Or Litany. 2020. PointContrast: Unsupervised pre-training for 3D point cloud understanding. In European Conference on Computer Vision.

Digital Library

[325]

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In European Conference on Computer Vision.

[326]

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic early exiting for accelerating BERT inference. In Conference of the Association for Computational Linguistics.

[327]

Wayne Xiong, Lingfeng Wu, Fil Alleva, Jasha Droppo, Xuedong Huang, and Andreas Stolcke. 2018. The Microsoft 2017 conversational speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing.

Digital Library

[328]

Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, and Xuanzhe Liu. 2019. A first look at deep learning apps on smartphones. In International World Wide Web Conference.

Digital Library

[329]

Qiangeng Xu, Xudong Sun, Cho-Ying Wu, Panqu Wang, and Ulrich Neumann. 2020. Grid-GCN for fast and scalable point cloud learning. In IEEE Conference on Computer Vision and Pattern Recognition.

[330]

Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. 2018. SpiderCNN: Deep learning on point sets with parameterized convolutional filters. In European Conference on Computer Vision.

[331]

Jian Xue, Jinyu Li, and Yifan Gong. 2013. Restructuring of deep neural network acoustic models with singular value decomposition. In Conference of the International Speech Communication Association.

[332]

Yan Yan, Yuxing Mao, and Bo Li. 2018. SECOND: Sparsely embedded convolutional detection. Sensors (2018).

[333]

Zhongxia Yan, Hanrui Wang, Demi Guo, and Song Han. 2020. MicroNet for efficient language modeling. J. Mach. Learn. Res. 123, 20 (2020), 215–231.

[334]

Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, and Yiyu Shi. 2020. Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks. In Design Automation Conference.

[335]

Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. In European Conference on Computer Vision.

[336]

Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. STD: Sparse-to-dense 3D object detector for point cloud. In International Conference on Computer Vision.

[337]

Serena Yeung, N. Lance Downing, Li Fei-Fei, and Arnold Milstein. 2018. Bedside computer vision—Moving artificial intelligence from driver assistance to patient safety. New Eng. J. Med. 14, 378 (2018), 1271–1273.

[338]

Jiahui Yu and Thomas Huang. 2019. AutoSlim: Towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728 (2019).

[339]

Jiahui Yu and Thomas S. Huang. 2019. Universally slimmable networks and improved training techniques. In International Conference on Computer Vision.

[340]

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In International Symposium on Computer Architecture.

[341]

Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas Huang. 2019. Slimmable neural networks. In International Conference on Learning Representations.

[342]

Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L1 optical flow. In Joint Pattern Recognition Symposium.

[343]

Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, and Andreas Moshovos. 2020. GOBO: Quantizing attention-based NLP models for low latency and energy efficient inference. In IEEE/ACM International Symposium on Microarchitecture.

[344]

Sergey Zagoruyko and Nikos Komodakis. 2017. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations.

[345]

Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. In Conference of the Association for Computational Linguistics.

[346]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In International Symposium on Field-programmable Gate Arrays.

[347]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In IEEE/ACM International Symposium on Microarchitecture.

[348]

Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, and Qun Liu. 2020. TernaryBERT: Distillation-aware ultra-low bit BERT. In Conference on Empirical Methods in Natural Language Processing.

[349]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Conference on Neural Information Processing Systems.

[350]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition.

[351]

Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. 2016. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 1943–1955.

Digital Library

[352]

Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient architecture for sparse matrix multiplication. In IEEE International Symposium on High-performance Computer Architecture.

[353]

Xiangyu Zhao, Chong Wang, Ming Chen, Xudong Zheng, Xiaobing Liu, and Jiliang Tang. 2020. AutoEmb: Automated embedding dimensionality search in streaming recommendations. arXiv preprint arXiv:2002.11252 (2020).

[354]

Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. 2018. Practical block-wise neural network architecture generation. In IEEE Conference on Computer Vision and Pattern Recognition.

[355]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2018. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. In IEEE Conference on Computer Vision and Pattern Recognition.

[356]

Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-end learning for point cloud based 3D object detection. In IEEE Conference on Computer Vision and Pattern Recognition.

[357]

Chenzhuo Zhu, Song Han, Huizi Mao, and William Dally. 2017. Trained ternary quantization. In International Conference on Learning Representations.

[358]

Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, and Song Han. 2021. Delayed gradient averaging: Tolerate the communication latency in federated learning. In Conference on Neural Information Processing Systems.

[359]

Ligeng Zhu, Zhijian Liu, and Han Song. 2019. Deep leakage for gradient. In Conference on Neural Information Processing Systems.

[360]

Ligeng Zhu, Yao Lu, Yujun Lin, and Song Han. 2019. Distributed training across the World. In NeurIPS Workshop on Systems for ML.

[361]

Sijie Zhu, Taojiannan Yang, Matias Mendieta, and Chen Chen. 2020. A3D: Adaptive 3D networks for video action recognition. arXiv preprint arXiv:2011.12384 (2020).

[362]

Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow-guided feature aggregation for video object detection. In International Conference on Computer Vision.

[363]

Zhuotun Zhu, Chenxi Liu, Dong Yang, Alan Yuille, and Daguang Xu. 2019. V-NAS: Neural architecture search for volumetric medical image segmentation. In International Conference on 3D Vision.

[364]

Ling Zhuo and Viktor K. Prasanna. 2005. Sparse matrix-vector multiplication on FPGAs. In International Symposium on Field-programmable Gate Arrays.

[365]

Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In International Conference on Learning Representations.

[366]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

[367]

Dan Zou, Yong Dou, Song Guo, and Shice Ni. 2013. High performance sparse matrix-vector multiplication on FPGA. IEICE Electron. Expr. (2013).

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Wu JWang LJin QLiu F(2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3340518
Burrello ARisso MMotetti BMacii EBenini LPagliari D(2024)Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT DevicesIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.332203312:3(780-794)Online publication date: Jul-2024
https://doi.org/10.1109/TETC.2023.3322033
Show More Cited By

Index Terms

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Recommendations

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Recent years have witnessed an exponential increase in the use of mobile and embedded devices. With the great success of deep learning in many fields, there is an emerging trend to deploy deep learning on mobile and embedded devices to better meet the ...
Mobile Sensing Through Deep Learning
Ph.D. Forum '17: Proceedings of the 2017 Workshop on MobiSys 2017 Ph.D. Forum

Today, mobile devices are equipped with powerful processors along with various on-device sensors. Over the past few years, deep learning has become the dominant approach in the field of machine learning due to its impressive performance. We envision ...
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 27, Issue 3

May 2022

245 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3498355

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 04 March 2022

Accepted: 01 September 2021

Revised: 01 July 2021

Received: 01 April 2021

Published in TODAES Volume 27, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Keynote
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
4,941
Total Downloads

Downloads (Last 12 months)1,012
Downloads (Last 6 weeks)129

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Wu JWang LJin QLiu F(2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3340518
Burrello ARisso MMotetti BMacii EBenini LPagliari D(2024)Enhancing Neural Architecture Search With Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT DevicesIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.332203312:3(780-794)Online publication date: Jul-2024
https://doi.org/10.1109/TETC.2023.3322033
Gu XDeligianni FHan JLiu XChen WYang GLo B(2024)Beyond Supervised Learning for Pervasive HealthcareIEEE Reviews in Biomedical Engineering10.1109/RBME.2023.329693817(42-62)Online publication date: 2024
https://doi.org/10.1109/RBME.2023.3296938
Lai HWang RWang CYang DChen WSheu J(2024)Socially-Aware Tile-Based Point Cloud Multicast with RegistrationICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622603(527-532)Online publication date: 9-Jun-2024
https://doi.org/10.1109/ICC51166.2024.10622603
Soltoggio ABen-Iwhiwhu EBraverman VEaton EEpstein BGe YHalperin LHow JItti LJacobs MKantharaju PLe LLee SLiu XMonteiro SMusliner DNath SPanda PPeridis CPirsiavash HParekh VRoy KShperberg SSiegelmann HStone PVedder KWu JYang LZheng GKolouri S(2024)A collective AI via lifelong learning and sharing at the edgeNature Machine Intelligence10.1038/s42256-024-00800-26:3(251-264)Online publication date: 22-Mar-2024
https://doi.org/10.1038/s42256-024-00800-2
Ngo HAkarapipad PLee PPark JChen FTrick AWang THsieh K(2024)Rapid and portable quantification of HIV RNA via a smartphone-enabled digital CRISPR device and deep learningSensors and Actuators Reports10.1016/j.snr.2024.1002128(100212)Online publication date: Dec-2024
https://doi.org/10.1016/j.snr.2024.100212
Wang ZYang PZhang BHu LLv WLin CWang Q(2024)Flexi-BOPI: Flexible granularity pipeline inference with Bayesian optimization for deep learning models on HMPSoCInformation Sciences10.1016/j.ins.2024.120984678(120984)Online publication date: Sep-2024
https://doi.org/10.1016/j.ins.2024.120984
Kamoona AGostar AWang XEaston MBab-Hadiashar AHoseinnezhad R(2024)Anomaly detection of defect using energy of point pattern features within random finite set frameworkEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107706130:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107706
Li YWang WCheng WNie G(2024)MineDet: A Real-Time Object Detection Framework Based Neural Architecture Search for Coal MinesAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5615-5_3(30-41)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5615-5_3
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents