Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3357384.3357921acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Regularizing Deep Neural Networks by Ensemble-based Low-Level Sample-Variances Method

Published: 03 November 2019 Publication History

Abstract

Deep Neural Networks (DNNs) with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Till now, many regularizers such as dropout, data augmentation have been proposed to prevent overfitting. Motivated by ensemble learning, we treat each hidden layer in neural networks as an ensemble of some base learners by dividing hidden units into some non-overlapping groups and each group is considered as a base learner. Based on the theoretical analysis of generalization error of ensemble estimators (bias-variance-covariance decomposition), we find the variance of each base learner plays an important role in preventing overfitting and propose a novel regularizer---\emphEnsemble-based Low-Level Sample-Variances Method (ELSM) to encourage each base learner of hidden layers to have a low-level sample-variance. Experiments across a number of datasets and network architectures show that ELSM can effectively reduce overfitting and improve the generalization ability of DNNs.

References

[1]
Mart'in Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et almbox. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
[2]
Monther Alhamdoosh and Dianhui Wang. 2014. Fast decorrelated neural network ensembles with random weights. Information Sciences, Vol. 264 (2014), 104--117.
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[4]
Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).
[5]
Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, and Dhruv Batra. 2015. Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015).
[6]
Andries Petrus Engelbrecht. 2001. A new pruning heuristic based on variance analysis of sensitivity information. IEEE transactions on Neural Networks, Vol. 12, 6 (2001), 1386--1399.
[7]
Hakan Erdogan, John R Hershey, Shinji Watanabe, and Jonathan Le Roux. 2015. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 708--712.
[8]
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural computation, Vol. 4, 1 (1992), 1--58.
[9]
Sanjay Surendranath Girija. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Software available from tensorflow. org (2016).
[10]
Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout networks. arXiv preprint arXiv:1302.4389 (2013).
[11]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 6645--6649.
[12]
Shuqin Gu, Yuexian Hou, Lipeng Zhang, and Yazhou Zhang. 2018. Regularizing Deep Neural Networks with an Ensemble-based Decorrelation Method. In IJCAI . 2177--2183.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[14]
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et almbox. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, Vol. 29 (2012).
[15]
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning . 597--606.
[16]
Md Monirul Islam, Md Abdus Sattar, Md Faijul Amin, Xin Yao, and Kazuyuki Murase. 2009. A new adaptive merging and growing algorithm for designing artificial neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 39, 3 (2009), 705--722.
[17]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images . Technical Report. Citeseer.
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[19]
Philippe Lauret, Eric Fock, and Thierry Alex Mara. 2006. A node pruning algorithm based on a Fourier amplitude sensitivity test method. IEEE transactions on neural networks, Vol. 17, 2 (2006), 273--293.
[20]
Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks, Vol. 12, 10 (1999), 1399--1404.
[21]
Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4293--4302.
[22]
Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, and Ioannis Mitliagkas. 2018. A Modern Take on the Bias-Variance Tradeoff in Neural Networks. arXiv preprint arXiv:1810.08591 (2018).
[23]
Bruce E Rosen. 1996. Ensemble learning using decorrelated neural networks. Connection science, Vol. 8, 3--4 (1996), 373--384.
[24]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, Vol. 15, 1 (2014), 1929--1958.
[25]
Sundaram Suresh, Keming Dong, and HJ Kim. 2010. A sequential learning algorithm for self-adaptive resource allocation network classifier. Neurocomputing, Vol. 73, 16--18 (2010), 3012--3019.
[26]
Naonori Ueda and Ryohei Nakano. 1996. Generalization error of ensemble estimators. In Proceedings of International Conference on Neural Networks (ICNN'96), Vol. 1. IEEE, 90--95.
[27]
Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. 2015. Visual tracking with fully convolutional networks. In Proceedings of the IEEE international conference on computer vision. 3119--3127.
[28]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[29]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).
[30]
Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, and Hongji Wang. 2018. Asynchronous bidirectional decoding for neural machine translation. In Thirty-Second AAAI Conference on Artificial Intelligence .

Cited By

View all
  • (2023)Detection Mature Bud for Daylily Based on Faster R-CNN Integrated With CBAMIEEE Access10.1109/ACCESS.2023.329959511(81646-81655)Online publication date: 2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bias-variance-covariance decomposition
  2. ensemble learning
  3. generalization ability
  4. neural networks

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • National Natural Science Foundation of China
  • Alibaba Innovation Research Foundation 2017
  • European Unions Horizon 2020 research and innovation programme under the Marie Skodowska-Curie grant agreement

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Detection Mature Bud for Daylily Based on Faster R-CNN Integrated With CBAMIEEE Access10.1109/ACCESS.2023.329959511(81646-81655)Online publication date: 2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media