Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2969033.2969123guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Do deep nets really need to be deep?

Published: 08 December 2014 Publication History

Abstract

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.

References

[1]
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, and Gerald Penn. Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 4277-4280. IEEE, 2012.
[2]
Ossama Abdel-Hamid, Li Deng, and Dong Yu. Exploring convolutional neural network structures and optimization techniques for speech recognition. Interspeech 2013, 2013.
[3]
Cristian Bucilu, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 535-541. ACM, 2006.
[4]
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303-314, 1989.
[5]
Yann N Dauphin and Yoshua Bengio. Big neural networks waste capacity. arXiv preprint arXiv:1301.3583, 2013.
[6]
Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, et al. Recent advances in deep learning for speech research at Microsoft. ICASSP 2013, 2013.
[7]
David Eigen, Jason Rolfe, Rob Fergus, and Yann LeCun. Understanding deep architectures using a recursive convolutional network. arXiv preprint arXiv:1312.1847, 2013.
[8]
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625-660, 2010.
[9]
Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of The 30th International Conference on Machine Learning, pages 1319-1327, 2013.
[10]
G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.
[11]
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[12]
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.
[13]
K.F. Lee and H.W. Hon. Speaker-independent phone recognition using hidden markov models. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(11):1641-1648, 1989.
[14]
Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. Acoustic modeling using deep belief networks. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):14-22, 2012.
[15]
V. Nair and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, pages 807-814. Omnipress Madison, WI, 2010.
[16]
Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6655-6659. IEEE, 2013.
[17]
Frank Seide, Gang Li, and Dong Yu. Conversational speech transcription using context-dependent deep neural networks. In Interspeech, pages 437-440, 2011.
[18]
Antonio Torralba, Robert Fergus, and William T Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958-1970, 2008.
[19]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:3371-3408, 2010.
[20]
Jian Xue, Jinyu Li, and Yifan Gong. Restructuring of deep neural network acoustic models with singular value decomposition. Proc. Interspeech, Lyon, France, 2013.
[21]
Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557, 2013.

Cited By

View all
  • (2023)KD-zeroProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669165(69490-69504)Online publication date: 10-Dec-2023
  • (2023)Defending against data-free model extraction by distributionally robust defensive trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666151(624-637)Online publication date: 10-Dec-2023
  • (2022)Measuring and reducing model update regression in structured prediction for NLPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601679(19384-19397)Online publication date: 28-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2
December 2014
3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)KD-zeroProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669165(69490-69504)Online publication date: 10-Dec-2023
  • (2023)Defending against data-free model extraction by distributionally robust defensive trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666151(624-637)Online publication date: 10-Dec-2023
  • (2022)Measuring and reducing model update regression in structured prediction for NLPProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601679(19384-19397)Online publication date: 28-Nov-2022
  • (2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
  • (2021)Improving deep learning interpretability by saliency guided trainingProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542308(26726-26739)Online publication date: 6-Dec-2021
  • (2021)Kernel approximation methods for speech recognitionThe Journal of Machine Learning Research10.5555/3322706.336200020:1(2121-2156)Online publication date: 9-Mar-2021
  • (2021)SAGE: A Split-Architecture Methodology for Efficient End-to-End Autonomous Vehicle ControlACM Transactions on Embedded Computing Systems10.1145/347700620:5s(1-22)Online publication date: 17-Sep-2021
  • (2021)Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475286(1517-1525)Online publication date: 17-Oct-2021
  • (2021)Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge DistillationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475254(826-834)Online publication date: 17-Oct-2021
  • (2021)Knowledge Distillation with Attention for Deep Transfer Learning of Convolutional NetworksACM Transactions on Knowledge Discovery from Data10.1145/347391216:3(1-20)Online publication date: 22-Oct-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media