Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Ensembling neural networks: many could be better than all

Published: 01 May 2002 Publication History

Abstract

Neural network ensemble is a learning paradigm where many neural networks are jointly used to solve a problem. In this paper, the relationship between the ensemble and its component neural networks is analyzed from the context of both regression and classification, which reveals that it may be better to ensemble many instead of all of the neural networks at hand. This result is interesting because at present, most approaches ensemble all the available neural networks for prediction. Then, in order to show that the appropriate neural networks for composing an ensemble can be effectively selected from a set of available neural networks, an approach named GASEN is presented. GASEN trains a number of neural networks at first. Then it assigns random weights to those networks and employs genetic algorithm to evolve the weights so that they can characterize to some extent the fitness of the neural networks in constituting an ensemble. Finally it selects some neural networks based on the evolved weights to make up the ensemble. A large empirical study shows that, compared with some popular ensemble approaches such as Bagging and Boosting, GASEN can generate neural network ensembles with far smaller sizes but stronger generalization ability. Furthermore, in order to understand the working mechanism of GASEN, the bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.

References

[1]
E. Bauer, R. Kohavi, An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants. Machine Learning 36(1-2) (1999)105-139.
[2]
C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA, 1998. http://www.ics.uci.edu/-mlearn/ MLRepository.htm.
[3]
L. Breiman, Bagging predictors, Machine Learning 24(2) (1996) 123-140.
[4]
L. Breiman, Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department. University of California, Berkeley. CA, 1996.
[5]
K.J. Cherkauer, Human expert level performance on a scientific image analysis task by a system using combined artificial neural networks., in: P. Chan, S. Stolfo. D. Wolpert (Eds.), Prose. AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Portland, OR, AAAI Press, Menlo Park, CA, 1996, pp. 15-21.
[6]
P. Cunningham, J. Carney, S. Jacob, Stability problems with artificial neural networks and the ensemble solution, Artificial Intelligence in Medicine 20(3) (2000) 217-225.
[7]
H. Demuth, M. Beale, Neural Network Toolbox for use with MATLAB, The MathWorks, Natick, MA, 1998.
[8]
H. Drucker, Boosting using neural nets, in: A. Sharkey (Ed.), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, Springer, London, 1999. pp. 51-77.
[9]
H. Drucker, R. Schapire, P. Simard, Improving performance in neural networks using a boosting algorithm, in: S.J. Hanson, J.D. Cowan, C.L. Giles (Eds.), Advances in Neural Information Processing Systems 5, Denver, CO. Morgan Kaufmann. San Mateo, CA, 1993, pp. 42-49.
[10]
B. Efron, R. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.
[11]
Y. Freund, Boosting a weak algorithm by majority, Inform, and Comput. 121(2) (1995) 256-285.
[12]
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, in: Proc. EurnCOLT-94, Barcelona, Spain, Springer. Berlin, 1995, pp. 23-37.
[13]
Y. Freund, R.E. Schapire. Experiments with a new boosting algorithm, in: Proc. ICML-96, Bad. Italy. Morgan Kaufmann, San Mateo, CA, 1996, pp. 148-156.
[14]
S. German, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma, Neural Comput. 4(l) (1992) 1-58.
[15]
D.E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning, AddisonWesley, Reading, MA, 1989.
[16]
S. Gutta, H. Wechsler, Face recognition using hybrid classifier systems, in: Proc. ICNN96, Washington, DC, IEEE Computer Society Press, Los Alamitos. CA, 1996, pp. 1017-1022.
[17]
J. Hampshire, A. Waibel, A novel objective function for improved phoneme recognition using lime-delay neural networks, IEEE Trans. Neural Networks 1(2) (1990) 216-228.
[18]
J.V. Hansen, Combining predictors: Meta machine learning methods and bias/variance and ambiguity decompositions, Ph.D. Dissertation, Department of Computer Science, University of Aarhus, Denmark, 2000.
[19]
L.K. Hansen, L. Liisberg, P. Salamon, Ensemble methods for handwritten digit recognition, in: Proc. IEEE Workshop on Neural Networks for Signal Processing. Helsingoer, Denmark, IEEE Press. Piscataway, NJ, 1992, pp. 333-342.
[20]
L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Machine Intelligence 12(10)0990) 993-1001.
[21]
C.R. Houck, J.A. Joines, M.G. Kay, A genetic algorithm for function optimization: A Matlab implementaion, Technical Report NCSU-}ETR-9509, North Carolina State University. Raleigh, NC, 1995.
[22]
F.J. Huang, Z.-H. Zhou, H.-J. Zhang, T.H. Chen, Pose invariant face recognition, in: Proc. 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, IEEE Computer Society Press, Los Alamnitos, CA. 2000. pp. 245-250.
[23]
R.A. Jacobs, M.l. Jordan, S.J. Nowlan, G.E. Hinton, Adaptively mixtures of local experts, Neural Comput. 3 (1)(1991)79-87.
[24]
D. Jimenez, Dynamically weighted ensemble neural networks for classification, in: Proc. lJCNN98, Vol. I, Anchorage. AK, IEEE Computer Society Press, Los Alamitos, CA, 1998. pp. 753-756.
[25]
M.I. Jordan, R.A. Jacobs, Hierarchical mixtures of experts and the EM algorithm, Neural Comput. 6 (2) (1994) 181-214.
[26]
R. Kohavi, D.H. Wolpert. Bias plus variance decomposition for zero-one loss functions, in: Proc. ICML-96, Ban. Italy, Morgan Kaufmann. San Mateo. CA, 1996. pp. 275-283.
[27]
E.B. Kong, T.G. Dietterich, Error-correcting output coding corrects bias and variance, in: Proc. ICML-95, Tahoe City, CA. Morgan Kaufmann, San Mateo, CA. 1995, pp. 313-321.
[28]
A. Krogh, J. Vedelsby. Neural network ensembles, cross validation, and active learning, in: G. Tesauro, D. Touretzky, T. Leen (Eds.). Advances in Neural Information Processing Systems 7. Denver. CO. MIT Press, Cambridge. MA, 1995, pp. 231-238.
[29]
R. Maclin, J.W. Shavlik, Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks, in: Proc. IJCAI-95, Montreal, Quebec, Morgan Kautmann. San Mateo, CA, 1995 pp. 524-530.
[30]
J. Mao, A case study on bagging, boosting and basic ensembles of neural networks for OCR. In: Proc IJCNN-98, Vol. 3, Anchorage, AK, IEEE Computer Society Press, Los Alamitos. CA. 1998, pp. 1828-1833
[31]
C.J. Merz, M.J. Pazzani, Combining neural network regression estimates with regularized linear weights, in: M.C. Mozer, M.l. Jordan, T. Petsche (Eds.), Advances in Neural Information Processing Systems 9, Denser CO. MIT Press, Cambridge, MA. 1997, pp. 564-570.
[32]
D. Opitz, R. Maclin, Popular ensemble methods: An empirical study, J. Artificial Intelligence Res. 11(1999) 169-198.
[33]
D.W. Opitz, J.W. Shavlik, Actively searching for an effective neural network ensemble. Connection Science 8(3-4) (1996) 337-353.
[34]
D.W. Opitz, J.W. Shavlik, Generating accurate and diverse members of a neural network ensemble, in: D.S. Touretzky, M.C. Mozer, M.E. Hasselmo (Eds.). Advances in Neural Information Processing Systems 8. Denver. CO. MIT Press. Cambridge, MA. 1996. pp. 535-541.
[35]
M.P. Perrone, L.N. Cooper, When networks disagree: Ensemble method for neural networks, in: R.J Mammone (Ed.), Artificial Neural Networks for Speech and Vision. Chapman & Hall. New York, 1993. pp. 126-142.
[36]
J.R. Quinlan, Bagging, Boosting, and C4.5, in: Proc. AAAI-96, Portland, OR, AAAI Press. Menlo Park. CA, 1996, pp. 725-730.
[37]
G. Ridgeway, D. Madigan, T. Richardson, Boosting methodology for regression problems. In: Proc AISTATS-99, Fort Lauderdale, FL, Morgan Kaofmann, San Mateo. CA. 1999, pp. 152-161.
[38]
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation, in D.E. Rumelhart, J.L. McCIelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I, MIT Press, Cambridge, MA, 1986. pp. 318-362.
[39]
R.E. Schapire, The strength of weak learnability. Machine Learning 5 (2) (1990)197-227.
[40]
A. Sharkey (Ed.), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, Springet. London. 1999.
[41]
Y. Shimshoni, N. Intrator, Classification of seismic signals by integrating ensembles of neural networks, IEEE Trans. Signal Process. 46(5) (1998)1194-1201.
[42]
P. Sollich, A. Krogh, Learning with ensembles: How over-fitting can be useful, in: D.S. Touretzky, M.C. Mozer, M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Denver. CO. MIT Press, Cambridge, MA, 1996, pp. 190-196.
[43]
N. Ueda, Optimal linear combination of neural networks for improving classification performance, IEEE Trans. Pattern Anal. Machine Intelligence 22(2) (2000) 207-215.
[44]
J.A.E. Weston, M.O. Stitson, A. Gammerman, V. Vovk, V. Vapnik, Experiments with support vectot machines, Technical Report: CSD-TR-96-19, Royal Holloway University of London, London. 1996.
[45]
D.H. Wolpert, Stacked generalization, Neural Networks 5 (2) (1992) 243-259.
[46]
X. Yao, Y. Liu, Making use of population information in evolutionary artificial neural networks, IEEE Transactions on Systems. Man and Cybernetics-Part B: Cybernetics 28(3) (1998) 417-425.
[47]
Z.-H. Zhou, Y. Jiang, Y.-B. Yang, S.-F. Chen, Lung cancer cell identification based on artificial neural network ensembles, Artificial Intelligence in Medicine 24 (1) (2002) 25-36.

Cited By

View all
  • (2025)An adversarial diverse deep ensemble approach for surrogate‐based traffic signal optimizationComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1335440:5(632-657)Online publication date: 4-Feb-2025
  • (2025)Learning adversarially robust kernel ensembles with kernel average poolingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.126017266:COnline publication date: 25-Mar-2025
  • (2025)Reconstruction error based implicit regularization method and its engineering application to lung cancer diagnosisEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109439139:PAOnline publication date: 1-Jan-2025
  • Show More Cited By

Index Terms

  1. Ensembling neural networks: many could be better than all

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Artificial Intelligence
    Artificial Intelligence  Volume 137, Issue 1-2
    May 2002
    262 pages

    Publisher

    Elsevier Science Publishers Ltd.

    United Kingdom

    Publication History

    Published: 01 May 2002

    Author Tags

    1. bagging
    2. bias-variance decomposition
    3. boosting
    4. genetic algorithm
    5. machine learning
    6. neural network ensemble
    7. neural networks
    8. selective ensemble

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)An adversarial diverse deep ensemble approach for surrogate‐based traffic signal optimizationComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1335440:5(632-657)Online publication date: 4-Feb-2025
    • (2025)Learning adversarially robust kernel ensembles with kernel average poolingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.126017266:COnline publication date: 25-Mar-2025
    • (2025)Reconstruction error based implicit regularization method and its engineering application to lung cancer diagnosisEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109439139:PAOnline publication date: 1-Jan-2025
    • (2024)Ensemble pruning for out-of-distribution generalizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693753(41416-41429)Online publication date: 21-Jul-2024
    • (2024)CDCL-VREJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23459346:1(2759-2773)Online publication date: 1-Jan-2024
    • (2024)MEPSIProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i10.28984(11078-11086)Online publication date: 20-Feb-2024
    • (2024)Class-Incremental Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.342938346:12(9851-9873)Online publication date: 1-Dec-2024
    • (2024)Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of OneIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339278246:11(7451-7462)Online publication date: 1-Nov-2024
    • (2024)Localizing discriminative regions for fine-grained visual recognitionNeurocomputing10.1016/j.neucom.2024.128611610:COnline publication date: 28-Dec-2024
    • (2024)ESENNeurocomputing10.1016/j.neucom.2024.128030599:COnline publication date: 28-Sep-2024
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media