Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Published: 01 December 2004 Publication History

Abstract

Bias-variance analysis provides a tool to study learning algorithms and can be used to properly design ensemble methods well tuned to the properties of a specific base learner. Indeed the effectiveness of ensemble methods critically depends on accuracy, diversity and learning characteristics of base learners. We present an extended experimental analysis of bias-variance decomposition of the error in Support Vector Machines (SVMs), considering Gaussian, polynomial and dot product kernels. A characterization of the error decomposition is provided, by means of the analysis of the relationships between bias, variance, kernel type and its parameters, offering insights into the way SVMs learn. The results show that the expected trade-off between bias and variance is sometimes observed, but more complex relationships can be detected, especially in Gaussian and polynomial kernels. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel param

References

[1]
E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113-141, 2000.]]
[2]
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36(1/2):525-536, 1999.]]
[3]
O. Bousquet and A. Elisseeff. Stability and Generalization. Journal of Machine Learning Research, 2:499-526, 2002.]]
[4]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996a.]]
[5]
L. Breiman. Bias, variance and arcing classifiers. Technical Report TR 460, Statistics Department, University of California, Berkeley, CA, 1996b.]]
[6]
L. Breiman. Random Forests. Machine Learning, 45(1):5-32, 2001.]]
[7]
I. Buciu, C. Kotropoulos, and I. Pitas. Combining Support Vector Machines for Accurate Face Detection. In Proc. of ICIP'01, volume 1, pages 1054-1057, 2001.]]
[8]
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131-159, 2002.]]
[9]
S. Cohen and N. Intrator. Automatic Model Selection in a Hybrid Perceptron/Radial Network. In Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349-358. Springer-Verlag, 2001.]]
[10]
T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, (7):1895-1924, 1998.]]
[11]
T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 1-15. Springer-Verlag, 2000a.]]
[12]
T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40(2):139-158, 2000b.]]
[13]
T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, (2):263-286, 1995.]]
[14]
P. Domingos. A unified bias-variance decomposition. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, 2000a.]]
[15]
P. Domingos. A Unified Bias-Variance Decomposition and its Applications. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 231-238, Stanford, CA, 2000b. Morgan Kaufmann.]]
[16]
P. Domingos. A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 564-569, Austin, TX, 2000c. AAAI Press.]]
[17]
T. Evgeniou, L. Perez-Breva, M. Pontil, and T. Poggio. Bounds on the Generalization Performance of Kernel Machine Ensembles. In P. Langley, editor, Proc. of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 271-278. Morgan Kaufmann, 2000.]]
[18]
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148-156. Morgan Kauffman, 1996.]]
[19]
J. H. Friedman. On bias, variance, 0/1 loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1:55-77, 1997.]]
[20]
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma. Neural Computation, 4(1):1-58, 1992.]]
[21]
Y. Grandvalet and S. Canu. Adaptive Scaling for Feature Selection in SVMs. In S. Becker, S. Thrun, and K. Obermayer, editors, NIPS 2002 Conference Proceedings, Advances in Neural Information Processing Systems, volume 15, Cambridge, MA, 2003. MIT Press.]]
[22]
T. Heskes. Bias/Variance Decompostion for Likelihood-Based Estimators. Neural Computation, 10:1425-1433, 1998.]]
[23]
T. K. Ho and M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289-300, 2002.]]
[24]
Y. S. Huang and Suen. C. Y. Combination of multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17: 90-94, 1995.]]
[25]
G. James. Variance and bias for general loss function. Machine Learning, (2):115-135, 2003.]]
[26]
T. Joachims. Making large scale SVM learning practical. In Smola A. Scholkopf B., Burges C., editor, Advances in Kernel Methods - Support Vector Learning, pages 169-184. MIT Press, Cambridge, MA, 1999.]]
[27]
H. C. Kim, S. Pang, H. M. Je, D. Kim, and S. Y. Bang. Pattern Classification Using Support Vector Machine Ensemble. In Proceedings of the International Conference on Pattern Recognition, 2002, volume 2, pages 20160-20163. IEEE, 2002.]]
[28]
J. Kittler, M. Hatef, R. P. W. Duin, and Matas J. On combining classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(3):226-239, 1998.]]
[29]
E. M. Kleinberg. A Mathematically Rigorous Foundation for Supervised Learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 67-76. Springer-Verlag, 2000.]]
[30]
R. Kohavi and D. H. Wolpert. Bias plus variance decomposition for zero-one loss functions. In Proc. of the Thirteenth International Conference on Machine Learning, The Seventeenth International Conference on Machine Learning, pages 275-283, Bari, Italy, 1996. Morgan Kaufmann.]]
[31]
E. Kong and T. G. Dietterich. Error-correcting output coding correct bias and variance. In The XII International Conference on Machine Learning, pages 313-321, San Francisco, CA, 1995. Morgan Kauffman.]]
[32]
L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2):299-314, 2001a.]]
[33]
L. I. Kuncheva, F. Roli, G. L. Marcialis, and C. A. Shipp. Complexity of Data Subsets Generated by the Random Subspace Method: An Experimental Investigation. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349-358. Springer-Verlag, 2001b.]]
[34]
L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles. Machine Learning, 51:181-207, 2003.]]
[35]
M. Li and P Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, Berlin, 1993.]]
[36]
L. Mason, P. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 2000.]]
[37]
C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, 1998. www.ics.uci.edu/mlearn/MLRepository.html.]]
[38]
A. Prodromidis, P. Chan, and S. Stolfo. Meta-Learning in Distributed Data Mining Systems: Issues and Approaches. In H. Kargupta and P. Chan, editors, Advances in Distributed Data Mining, pages 81-113. AAAI Press, 1999.]]
[39]
R. E. Schapire. A brief introduction to boosting. In Thomas Dean, editor, 16th International Joint Conference on Artificial Intelligence, pages 1401-1406. Morgan Kauffman, 1999.]]
[40]
R. E. Schapire, Y. Freund, P. Bartlett, and W. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998.]]
[41]
B. Scholkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.]]
[42]
R. Tibshirani. Bias, variance and prediction error for classification rules. Technical report, Department of Preventive Medicine and Biostatistics and Department od Statistics, University of Toronto, Toronto, Canada, 1996.]]
[43]
G. Valentini and T. G. Dietterich. Low Bias Bagged Support Vector Machines. In T. Fawcett and N. Mishra, editors, Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), pages 752-759, Washington D. C., USA, 2003. AAAI Press.]]
[44]
G. Valentini and F. Masulli. NEURObjects: an object-oriented library for neural network development. Neurocomputing, 48(1-4):623-646, 2002.]]
[45]
G. Valentini, M. Muselli, and F. Ruffino. Bagged Ensembles of SVMs for Gene Expression Data Analysis. In IJCNN2003, The IEEE-INNS-ENNS International Joint Conference on Neural Networks , pages 1844-49, Portland, USA, 2003. IEEE.]]
[46]
V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.]]
[47]
D. Wang, J. M. Keller, C. A. Carson, K. K. McAdoo-Edwards, and C. W. Bailey. Use of fuzzy logic inspired features to improve bacterial recognition through classifier fusion. IEEE Transactions on Systems, Man and Cybernetics, 28B(4):583-591, 1998.]]

Cited By

View all
  1. Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 5, Issue
      12/1/2004
      1571 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 December 2004
      Published in JMLR Volume 5

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)45
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Deep negative correlation classificationMachine Language10.1007/s10994-024-06604-0113:10(7223-7241)Online publication date: 28-Aug-2024
      • (2023)Rebalance Weights AdaBoost-SVM Model for Imbalanced DataComputational Intelligence and Neuroscience10.1155/2023/48605362023Online publication date: 19-Jan-2023
      • (2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
      • (2021)Learnable Evolutionary Search Across Heterogeneous Problems via Kernelized AutoencodingIEEE Transactions on Evolutionary Computation10.1109/TEVC.2021.305651425:3(567-581)Online publication date: 1-Jun-2021
      • (2019)Intelligent content-based cybercrime detection in online social networks using cuckoo search metaheuristic approachThe Journal of Supercomputing10.1007/s11227-019-03113-z76:7(5402-5424)Online publication date: 16-Dec-2019
      • (2017)Ensembles of Deep LSTM Learners for Activity Recognition using WearablesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/30900761:2(1-28)Online publication date: 30-Jun-2017
      • (2017)A Primer on Cluster Analysis: 4 Basic Methods That (Usually) Work [Book Review]IEEE Computational Intelligence Magazine10.1109/MCI.2017.274287012:4(98-100)Online publication date: 1-Nov-2017
      • (2017)Benchmarking Ensemble Classifiers with Novel Co-Trained Kernal Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier]IEEE Computational Intelligence Magazine10.1109/MCI.2017.274286712:4(61-72)Online publication date: 1-Nov-2017
      • (2017)Collecting large training dataset of actual facial images from facebook for developing a weighted bagging gender classifierCluster Computing10.1007/s10586-017-0958-520:3(2157-2165)Online publication date: 1-Sep-2017
      • (2016)Ensembles of random sphere cover classifiersPattern Recognition10.1016/j.patcog.2015.07.01049:C(213-225)Online publication date: 1-Jan-2016
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media