Article

Do deep nets really need to be deep?

Authors:

Rich CaruanaAuthors Info & Claims

NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

Pages 2654 - 2662

Published: 08 December 2014 Publication History

Abstract

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.

References

[1]

Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, and Gerald Penn. Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 4277-4280. IEEE, 2012.

[2]

Ossama Abdel-Hamid, Li Deng, and Dong Yu. Exploring convolutional neural network structures and optimization techniques for speech recognition. Interspeech 2013, 2013.

[3]

Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 535-541. ACM, 2006.

[4]

George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303-314, 1989.

[5]

Yann N Dauphin and Yoshua Bengio. Big neural networks waste capacity. arXiv preprint arXiv:1301.3583, 2013.

[6]

Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, et al. Recent advances in deep learning for speech research at Microsoft. ICASSP 2013, 2013.

[7]

David Eigen, Jason Rolfe, Rob Fergus, and Yann LeCun. Understanding deep architectures using a recursive convolutional network. arXiv preprint arXiv:1312.1847, 2013.

[8]

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625-660, 2010.

[9]

Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of The 30th International Conference on Machine Learning, pages 1319-1327, 2013.

[10]

G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.

[11]

G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[12]

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.

[13]

K.F. Lee and H.W. Hon. Speaker-independent phone recognition using hidden markov models. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(11):1641-1648, 1989.

[14]

Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. Acoustic modeling using deep belief networks. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):14-22, 2012.

[15]

V. Nair and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, pages 807-814. Omnipress Madison, WI, 2010.

[16]

Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6655-6659. IEEE, 2013.

[17]

Frank Seide, Gang Li, and Dong Yu. Conversational speech transcription using context-dependent deep neural networks. In Interspeech, pages 437-440, 2011.

[18]

Antonio Torralba, Robert Fergus, and William T Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958-1970, 2008.

[19]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:3371-3408, 2010.

[20]

Jian Xue, Jinyu Li, and Yifan Gong. Restructuring of deep neural network acoustic models with singular value decomposition. Proc. Interspeech, Lyon, France, 2013.

[21]

Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557, 2013.

Cited By

Li LDong PLi AWei ZYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)KD-zeroProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669165(69490-69504)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669165
Wang ZShen LLiu TDuan TZhu YZhan DDoermann DGao MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Defending against data-free model extraction by distributionally robust defensive trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666151(624-637)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666151
Cai DMansimov ELai YSu YShu LZhang YKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Measuring and reducing model update regression in structured prediction for NLPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601679(19384-19397)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601679
Show More Cited By

Do deep nets really need to be deep?
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Research on Point-wise Gated Deep Networks

Display Omitted We introduce pgRBMs into DBNs and present Point-wise Gated Deep Belief Networks.Similar to pgDBNs, Point-wise Gated Deep Boltzmann Machines are presented.We introduce dropout and weight uncertainty methods into pgRBMs.We discuss the ...
Foreign accent classification using deep neural nets
Special section: Intelligent data analysis and applications & smart vehicular technology, communications and applications

Speech analysis for extracting attributes such as the speaker, gender, accent and like has been a field of great interest and has been widely studied. The paper presents a novel architecture for accent identification by using a cascade of two deep-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

December 2014

3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
9
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li LDong PLi AWei ZYang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)KD-zeroProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669165(69490-69504)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669165
Wang ZShen LLiu TDuan TZhu YZhan DDoermann DGao MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Defending against data-free model extraction by distributionally robust defensive trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666151(624-637)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666151
Cai DMansimov ELai YSu YShu LZhang YKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Measuring and reducing model update regression in structured prediction for NLPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601679(19384-19397)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601679
Gupta MAgrawal P(2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3487045
Ismail AFeizi SBravo HRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Improving deep learning interpretability by saliency guided trainingProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542308(26726-26739)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542308
May AGarakani ALu ZGuo DLiu KBellet AFan LCollins MHsu DKingsbury BPicheny MSha F(2021)Kernel approximation methods for speech recognitionThe Journal of Machine Learning Research10.5555/3322706.336200020:1(2121-2156)Online publication date: 9-Mar-2021
https://dl.acm.org/doi/10.5555/3322706.3362000
Malawade AOdema MLajeunesse-degroot SAl Faruque M(2021)SAGE: A Split-Architecture Methodology for Efficient End-to-End Autonomous Vehicle ControlACM Transactions on Embedded Computing Systems10.1145/347700620:5s(1-22)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1145/3477006
Zhang PDuan JHuang ZYin HShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475286(1517-1525)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475286
Ge YWang QZheng BZhuang XLi QShen CWang CShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge DistillationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475254(826-834)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475254
Li XXiong HChen ZHuan JLiu JXu CDou D(2021)Knowledge Distillation with Attention for Deep Transfer Learning of Convolutional NetworksACM Transactions on Knowledge Discovery from Data10.1145/347391216:3(1-20)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.1145/3473912
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents