article

Ensembling neural networks: many could be better than all

Authors:

Wei TangAuthors Info & Claims

Artificial Intelligence, Volume 137, Issue 1-2

Pages 239 - 263

https://doi.org/10.1016/S0004-3702(02)00190-X

Published: 01 May 2002 Publication History

Abstract

Neural network ensemble is a learning paradigm where many neural networks are jointly used to solve a problem. In this paper, the relationship between the ensemble and its component neural networks is analyzed from the context of both regression and classification, which reveals that it may be better to ensemble many instead of all of the neural networks at hand. This result is interesting because at present, most approaches ensemble all the available neural networks for prediction. Then, in order to show that the appropriate neural networks for composing an ensemble can be effectively selected from a set of available neural networks, an approach named GASEN is presented. GASEN trains a number of neural networks at first. Then it assigns random weights to those networks and employs genetic algorithm to evolve the weights so that they can characterize to some extent the fitness of the neural networks in constituting an ensemble. Finally it selects some neural networks based on the evolved weights to make up the ensemble. A large empirical study shows that, compared with some popular ensemble approaches such as Bagging and Boosting, GASEN can generate neural network ensembles with far smaller sizes but stronger generalization ability. Furthermore, in order to understand the working mechanism of GASEN, the bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.

References

[1]

E. Bauer, R. Kohavi, An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants. Machine Learning 36(1-2) (1999)105-139.

[2]

C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA, 1998. http://www.ics.uci.edu/-mlearn/ MLRepository.htm.

[3]

L. Breiman, Bagging predictors, Machine Learning 24(2) (1996) 123-140.

[4]

L. Breiman, Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department. University of California, Berkeley. CA, 1996.

[5]

K.J. Cherkauer, Human expert level performance on a scientific image analysis task by a system using combined artificial neural networks., in: P. Chan, S. Stolfo. D. Wolpert (Eds.), Prose. AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Portland, OR, AAAI Press, Menlo Park, CA, 1996, pp. 15-21.

[6]

P. Cunningham, J. Carney, S. Jacob, Stability problems with artificial neural networks and the ensemble solution, Artificial Intelligence in Medicine 20(3) (2000) 217-225.

Digital Library

[7]

H. Demuth, M. Beale, Neural Network Toolbox for use with MATLAB, The MathWorks, Natick, MA, 1998.

[8]

H. Drucker, Boosting using neural nets, in: A. Sharkey (Ed.), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, Springer, London, 1999. pp. 51-77.

[9]

H. Drucker, R. Schapire, P. Simard, Improving performance in neural networks using a boosting algorithm, in: S.J. Hanson, J.D. Cowan, C.L. Giles (Eds.), Advances in Neural Information Processing Systems 5, Denver, CO. Morgan Kaufmann. San Mateo, CA, 1993, pp. 42-49.

[10]

B. Efron, R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.

[11]

Y. Freund, Boosting a weak algorithm by majority, Inform, and Comput. 121(2) (1995) 256-285.

Digital Library

[12]

Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, in: Proc. EurnCOLT-94, Barcelona, Spain, Springer. Berlin, 1995, pp. 23-37.

[13]

Y. Freund, R.E. Schapire. Experiments with a new boosting algorithm, in: Proc. ICML-96, Bad. Italy. Morgan Kaufmann, San Mateo, CA, 1996, pp. 148-156.

[14]

S. German, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Comput. 4(l) (1992) 1-58.

[15]

D.E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning, AddisonWesley, Reading, MA, 1989.

[16]

S. Gutta, H. Wechsler, Face recognition using hybrid classifier systems, in: Proc. ICNN96, Washington, DC, IEEE Computer Society Press, Los Alamitos. CA, 1996, pp. 1017-1022.

[17]

J. Hampshire, A. Waibel, A novel objective function for improved phoneme recognition using lime-delay neural networks, IEEE Trans. Neural Networks 1(2) (1990) 216-228.

Digital Library

[18]

J.V. Hansen, Combining predictors: Meta machine learning methods and bias/variance and ambiguity decompositions, Ph.D. Dissertation, Department of Computer Science, University of Aarhus, Denmark, 2000.

[19]

L.K. Hansen, L. Liisberg, P. Salamon, Ensemble methods for handwritten digit recognition, in: Proc. IEEE Workshop on Neural Networks for Signal Processing. Helsingoer, Denmark, IEEE Press. Piscataway, NJ, 1992, pp. 333-342.

[20]

L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Machine Intelligence 12(10)0990) 993-1001.

Digital Library

[21]

C.R. Houck, J.A. Joines, M.G. Kay, A genetic algorithm for function optimization: A Matlab implementaion, Technical Report NCSU-}ETR-9509, North Carolina State University. Raleigh, NC, 1995.

[22]

F.J. Huang, Z.-H. Zhou, H.-J. Zhang, T.H. Chen, Pose invariant face recognition, in: Proc. 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, IEEE Computer Society Press, Los Alamnitos, CA. 2000. pp. 245-250.

[23]

R.A. Jacobs, M.l. Jordan, S.J. Nowlan, G.E. Hinton, Adaptively mixtures of local experts, Neural Comput. 3 (1)(1991)79-87.

[24]

D. Jimenez, Dynamically weighted ensemble neural networks for classification, in: Proc. lJCNN98, Vol. I, Anchorage. AK, IEEE Computer Society Press, Los Alamitos, CA, 1998. pp. 753-756.

[25]

M.I. Jordan, R.A. Jacobs, Hierarchical mixtures of experts and the EM algorithm, Neural Comput. 6 (2) (1994) 181-214.

Digital Library

[26]

R. Kohavi, D.H. Wolpert. Bias plus variance decomposition for zero-one loss functions, in: Proc. ICML-96, Ban. Italy, Morgan Kaufmann. San Mateo. CA, 1996. pp. 275-283.

[27]

E.B. Kong, T.G. Dietterich, Error-correcting output coding corrects bias and variance, in: Proc. ICML-95, Tahoe City, CA. Morgan Kaufmann, San Mateo, CA. 1995, pp. 313-321.

[28]

A. Krogh, J. Vedelsby. Neural network ensembles, cross validation, and active learning, in: G. Tesauro, D. Touretzky, T. Leen (Eds.). Advances in Neural Information Processing Systems 7. Denver. CO. MIT Press, Cambridge. MA, 1995, pp. 231-238.

[29]

R. Maclin, J.W. Shavlik, Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kautmann. San Mateo, CA, 1995 pp. 524-530.

[30]

J. Mao, A case study on bagging, boosting and basic ensembles of neural networks for OCR. In: Proc IJCNN-98, Vol. 3, Anchorage, AK, IEEE Computer Society Press, Los Alamitos. CA. 1998, pp. 1828-1833

[31]

C.J. Merz, M.J. Pazzani, Combining neural network regression estimates with regularized linear weights, in: M.C. Mozer, M.l. Jordan, T. Petsche (Eds.), Advances in Neural Information Processing Systems 9, Denser CO. MIT Press, Cambridge, MA. 1997, pp. 564-570.

[32]

D. Opitz, R. Maclin, Popular ensemble methods: An empirical study, J. Artificial Intelligence Res. 11(1999) 169-198.

Digital Library

[33]

D.W. Opitz, J.W. Shavlik, Actively searching for an effective neural network ensemble. Connection Science 8(3-4) (1996) 337-353.

[34]

D.W. Opitz, J.W. Shavlik, Generating accurate and diverse members of a neural network ensemble, in: D.S. Touretzky, M.C. Mozer, M.E. Hasselmo (Eds.). Advances in Neural Information Processing Systems 8. Denver. CO. MIT Press. Cambridge, MA. 1996. pp. 535-541.

[35]

M.P. Perrone, L.N. Cooper, When networks disagree: Ensemble method for neural networks, in: R.J Mammone (Ed.), Artificial Neural Networks for Speech and Vision. Chapman & Hall. New York, 1993. pp. 126-142.

[36]

J.R. Quinlan, Bagging, Boosting, and C4.5, in: Proc. AAAI-96, Portland, OR, AAAI Press. Menlo Park. CA, 1996, pp. 725-730.

[37]

G. Ridgeway, D. Madigan, T. Richardson, Boosting methodology for regression problems. In: Proc AISTATS-99, Fort Lauderdale, FL, Morgan Kaofmann, San Mateo. CA. 1999, pp. 152-161.

[38]

D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation, in D.E. Rumelhart, J.L. McCIelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I, MIT Press, Cambridge, MA, 1986. pp. 318-362.

[39]

R.E. Schapire, The strength of weak learnability. Machine Learning 5 (2) (1990)197-227.

Digital Library

[40]

A. Sharkey (Ed.), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, Springet. London. 1999.

[41]

Y. Shimshoni, N. Intrator, Classification of seismic signals by integrating ensembles of neural networks, IEEE Trans. Signal Process. 46(5) (1998)1194-1201.

Digital Library

[42]

P. Sollich, A. Krogh, Learning with ensembles: How over-fitting can be useful, in: D.S. Touretzky, M.C. Mozer, M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Denver. CO. MIT Press, Cambridge, MA, 1996, pp. 190-196.

[43]

N. Ueda, Optimal linear combination of neural networks for improving classification performance, IEEE Trans. Pattern Anal. Machine Intelligence 22(2) (2000) 207-215.

Digital Library

[44]

J.A.E. Weston, M.O. Stitson, A. Gammerman, V. Vovk, V. Vapnik, Experiments with support vectot machines, Technical Report: CSD-TR-96-19, Royal Holloway University of London, London. 1996.

[45]

D.H. Wolpert, Stacked generalization, Neural Networks 5 (2) (1992) 243-259.

Digital Library

[46]

X. Yao, Y. Liu, Making use of population information in evolutionary artificial neural networks, IEEE Transactions on Systems. Man and Cybernetics-Part B: Cybernetics 28(3) (1998) 417-425.

Digital Library

[47]

Z.-H. Zhou, Y. Jiang, Y.-B. Yang, S.-F. Chen, Lung cancer cell identification based on artificial neural network ensembles, Artificial Intelligence in Medicine 24 (1) (2002) 25-36.

Digital Library

Cited By

Tang ZWang RChung EGu WZhu H(2025)An adversarial diverse deep ensemble approach for surrogate‐based traffic signal optimizationComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1335440:5(632-657)Online publication date: 4-Feb-2025
https://dl.acm.org/doi/10.1111/mice.13354
Bashivan PBayat RIbrahim ADehghani ARen Y(2025)Learning adversarially robust kernel ensembles with kernel average poolingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.126017266:COnline publication date: 25-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.126017
Zheng QTian XYang MHan SElhanashi ASaponara SKpalma K(2025)Reconstruction error based implicit regularization method and its engineering application to lung cancer diagnosisEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109439139:PAOnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109439
Show More Cited By

Index Terms

Ensembling neural networks: many could be better than all
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Class-switching neural network ensembles

This article investigates the properties of class-switching ensembles composed of neural networks and compares them to class-switching ensembles of decision trees and to standard ensemble learning methods, such as bagging and boosting. In a class-...
Ensemble with neural networks for bankruptcy prediction

In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impact. Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ...
Using a Neural Network to Approximate an Ensemble of Classifiers

Several methods (e.g., Bagging, Boosting) of constructing and combining an ensemble of classifiers have recently been shown capable of improving accuracy of a class of commonly used classifiers (e.g., decision trees, neural networks). The accuracy gain ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence

Artificial Intelligence Volume 137, Issue 1-2

May 2002

262 pages

ISSN:0004-3702

Issue’s Table of Contents

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 May 2002

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

396
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tang ZWang RChung EGu WZhu H(2025)An adversarial diverse deep ensemble approach for surrogate‐based traffic signal optimizationComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1335440:5(632-657)Online publication date: 4-Feb-2025
https://dl.acm.org/doi/10.1111/mice.13354
Bashivan PBayat RIbrahim ADehghani ARen Y(2025)Learning adversarially robust kernel ensembles with kernel average poolingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.126017266:COnline publication date: 25-Mar-2025
https://dl.acm.org/doi/10.1016/j.eswa.2024.126017
Zheng QTian XYang MHan SElhanashi ASaponara SKpalma K(2025)Reconstruction error based implicit regularization method and its engineering application to lung cancer diagnosisEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109439139:PAOnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.engappai.2024.109439
Qiao FPeng XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Ensemble pruning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693753(41416-41429)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693753
Sun HLi Z(2024)CDCL-VREJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23459346:1(2759-2773)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-234593
Bi XZhang SJiang YWooldridge MDy JNatarajan S(2024)MEPSIProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i10.28984(11078-11086)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i10.28984
Zhou DWang QQi ZYe HZhan DLiu Z(2024)Class-Incremental Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.342938346:12(9851-9873)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3429383
Zhang HZhu YLi X(2024)Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of OneIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339278246:11(7451-7462)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3392782
Fang FLiu YXu Q(2024)Localizing discriminative regions for fine-grained visual recognitionNeurocomputing10.1016/j.neucom.2024.128611610:COnline publication date: 28-Dec-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128611
Wang JShi YChen ZWen M(2024)ESENNeurocomputing10.1016/j.neucom.2024.128030599:COnline publication date: 28-Sep-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128030
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents