article

Free access

Dropout: a simple way to prevent neural networks from overfitting

Editors: Kevin Murphy, Bernhard Schölkopf Authors:

Nitish Srivastava,

Geoffrey Hinton,

Alex Krizhevsky,

Ilya Sutskever,

Ruslan SalakhutdinovAuthors Info & Claims

The Journal of Machine Learning Research, Volume 15, Issue 1

Pages 1929 - 1958

Published: 01 January 2014 Publication History

PDF eReader Publisher Site

Abstract

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

References

[1]

M. Chen, Z. Xu, K. Weinberger, and F. Sha. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, pages 767-774. ACM, 2012.

Digital Library

[2]

G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton. Phone recognition with the mean-covariance restricted Boltzmann machine. In Advances in Neural Information Processing Systems 23, pages 469-477, 2010.

[3]

O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning, 81(2):149-178, 2010.

Digital Library

[4]

A. Globerson and S. Roweis. Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd International Conference on Machine Learning, pages 353-360. ACM, 2006.

Digital Library

[5]

I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning, pages 1319- 1327. ACM, 2013.

[6]

G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.

[7]

G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.

Digital Library

[8]

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In Proceedings of the International Conference on Computer Vision (ICCV'09). IEEE, 2009.

[9]

A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[10]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106-1114, 2012.

Digital Library

[11]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation , 1(4):541-551, 1989.

Digital Library

[12]

Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, Z. Li, M.-H. Tsai, X. Zhou, T. Huang, and T. Zhang. Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge, 2010.

[13]

A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman. Sex, mixability, and modularity. Proceedings of the National Academy of Sciences, 107(4):1452-1457, 2010.

[14]

V. Mnih. CUDAMat: a CUDA-based matrix class for Python. Technical Report UTML TR 2009-004, Department of Computer Science, University of Toronto, November 2009.

[15]

A. Mohamed, G. E. Dahl, and G. E. Hinton. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 2010.

Digital Library

[16]

R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag New York, Inc., 1996.

Digital Library

[17]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.

[18]

S. J. Nowlan and G. E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 1992.

Digital Library

[19]

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely. The Kaldi Speech Recognition Toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.

[20]

R. Salakhutdinov and G. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448-455, 2009.

[21]

R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 2008.

Digital Library

[22]

J. Sanchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pages 1665-1672, 2011.

Digital Library

[23]

P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In International Conference on Pattern Recognition (ICPR 2012), 2012.

[24]

P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, volume 2, pages 958-962, 2003.

Digital Library

[25]

J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960-2968, 2012.

[26]

N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Proceedings of the 18th annual conference on Learning Theory, COLT'05, pages 545-560. Springer-Verlag, 2005.

Digital Library

[27]

N. Srivastava. Improving Neural Networks with Dropout. Master's thesis, University of Toronto, January 2013.

[28]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Methodological, 58(1):267-288, 1996.

[29]

A. N. Tikhonov. On the stability of inverse problems. Doklady Akademii Nauk SSSR, 39(5): 195-198, 1943.

[30]

L. van der Maaten, M. Chen, S. Tyree, and K. Q. Weinberger. Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning, pages 410-418. ACM, 2013.

[31]

P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096-1103. ACM, 2008.

Digital Library

[32]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. In Proceedings of the 27th International Conference on Machine Learning, pages 3371-3408. ACM, 2010.

[33]

S. Wager, S. Wang, and P. Liang. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems 26, pages 351-359, 2013.

[34]

S. Wang and C. D. Manning. Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning, pages 118-126. ACM, 2013.

Digital Library

[35]

H. Y. Xiong, Y. Barash, and B. J. Frey. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics, 27(18):2554-2562, 2011.

Digital Library

[36]

M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.

Cited By

Zhao JLiu DMeng L(2024)Remaining Useful Life Prediction of a Lithium–Ion Battery Based on a Temporal Convolutional Network with Data ExtensionInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-000834:1(105-117)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.61822/amcs-2024-0008
Kumar NChauhan R(2024)Speculation of Stock Marketing Using Advanced Recursive TechniquesInternational Journal of Business Data Communications and Networking10.4018/IJBDCN.33989019:1(1-18)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.4018/IJBDCN.339890
Jothi JChinnadurai M(2024)A combined deep CNN with a chimp optimization approach for lung cancer diagnosisJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23733946:2(4681-4696)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.3233/JIFS-237339
Show More Cited By

Dropout: a simple way to prevent neural networks from overfitting
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Establishing criteria to ensure successful feedforward artificial neural network modelling of mechanical systems

The emulation of mechanical systems is a popular application of artificial neural networks in engineering. This paper examines general principles of modelling mechanical systems by feedforward artificial neural networks (FFANNs). The slow convergence ...
The dropout learning algorithm

Dropout is a recently introduced algorithm for training neural networks by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using ...
On the approximation of functions by tanh neural networks
Abstract
We derive bounds on the error, in high-order Sobolev norms, incurred in the approximation of Sobolev-regular as well as analytic functions by neural networks with the hyperbolic tangent activation function. These bounds provide explicit estimates ...
Highlights
- Explicit bounds for function approximation in Sobolev norms by tanh neural networks.
- Tanh networks with 2 hidden layers are at least as expressive as deeper ReLU networks.
- Improved convergence rate for neural network approximation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 15, Issue 1

January 2014

4085 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2014

Published in JMLR Volume 15, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5,255
Total Citations
View Citations
19,173
Total Downloads

Downloads (Last 12 months)3,018
Downloads (Last 6 weeks)424

Reflects downloads up to 18 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao JLiu DMeng L(2024)Remaining Useful Life Prediction of a Lithium–Ion Battery Based on a Temporal Convolutional Network with Data ExtensionInternational Journal of Applied Mathematics and Computer Science10.61822/amcs-2024-000834:1(105-117)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.61822/amcs-2024-0008
Kumar NChauhan R(2024)Speculation of Stock Marketing Using Advanced Recursive TechniquesInternational Journal of Business Data Communications and Networking10.4018/IJBDCN.33989019:1(1-18)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.4018/IJBDCN.339890
Jothi JChinnadurai M(2024)A combined deep CNN with a chimp optimization approach for lung cancer diagnosisJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23733946:2(4681-4696)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.3233/JIFS-237339
Sun HLi Z(2024)CDCL-VREJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23459346:1(2759-2773)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-234593
Li SZhang Y(2024)Improving entity linking by combining semantic entity embeddings and cross-attention encoderJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23312446:1(2899-2910)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-233124
Xu JWen YHuang SYu Z(2024)A multi-domain adaptive neural machine translation method based on domain data balancerIntelligent Data Analysis10.3233/IDA-23015528:3(685-698)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-230155
Zhang XXiang YLiu ZHu XZhou D(2024)I2RIntelligent Data Analysis10.3233/IDA-23008228:3(807-823)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-230082
Ennaji Fatima Zohra El Kabtane Hamada (2024)Transfer Learning Based Face Emotion Recognition Using Meshed Faces and Oval Cropping: A Novel ApproachOptical Memory and Neural Networks10.3103/S1060992X2470007333:2(178-192)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.3103/S1060992X24700073
Zuo RLi GCao RChoi BXu JBhowmick S(2024)DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time SeriesProceedings of the VLDB Endowment10.14778/3681954.368199617:11(3229-3242)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681996
Chen YLi KLi GWang Y(2024)Contributions Estimation in Federated Learning: A Comprehensive Experimental EvaluationProceedings of the VLDB Endowment10.14778/3659437.365945917:8(2077-2090)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659459
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents