article

Free access

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Authors:

Giorgio Valentini,

Thomas G. DietterichAuthors Info & Claims

The Journal of Machine Learning Research, Volume 5

Pages 725 - 775

Published: 01 December 2004 Publication History

Abstract

Bias-variance analysis provides a tool to study learning algorithms and can be used to properly design ensemble methods well tuned to the properties of a specific base learner. Indeed the effectiveness of ensemble methods critically depends on accuracy, diversity and learning characteristics of base learners. We present an extended experimental analysis of bias-variance decomposition of the error in Support Vector Machines (SVMs), considering Gaussian, polynomial and dot product kernels. A characterization of the error decomposition is provided, by means of the analysis of the relationships between bias, variance, kernel type and its parameters, offering insights into the way SVMs learn. The results show that the expected trade-off between bias and variance is sometimes observed, but more complex relationships can be detected, especially in Gaussian and polynomial kernels. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel param

References

[1]

E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113-141, 2000.]]

Digital Library

[2]

E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36(1/2):525-536, 1999.]]

Digital Library

[3]

O. Bousquet and A. Elisseeff. Stability and Generalization. Journal of Machine Learning Research, 2:499-526, 2002.]]

Digital Library

[4]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996a.]]

[5]

L. Breiman. Bias, variance and arcing classifiers. Technical Report TR 460, Statistics Department, University of California, Berkeley, CA, 1996b.]]

[6]

L. Breiman. Random Forests. Machine Learning, 45(1):5-32, 2001.]]

Digital Library

[7]

I. Buciu, C. Kotropoulos, and I. Pitas. Combining Support Vector Machines for Accurate Face Detection. In Proc. of ICIP'01, volume 1, pages 1054-1057, 2001.]]

[8]

O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131-159, 2002.]]

Digital Library

[9]

S. Cohen and N. Intrator. Automatic Model Selection in a Hybrid Perceptron/Radial Network. In Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349-358. Springer-Verlag, 2001.]]

Digital Library

[10]

T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, (7):1895-1924, 1998.]]

Digital Library

[11]

T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 1-15. Springer-Verlag, 2000a.]]

Digital Library

[12]

T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40(2):139-158, 2000b.]]

Digital Library

[13]

T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, (2):263-286, 1995.]]

Digital Library

[14]

P. Domingos. A unified bias-variance decomposition. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, 2000a.]]

[15]

P. Domingos. A Unified Bias-Variance Decomposition and its Applications. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 231-238, Stanford, CA, 2000b. Morgan Kaufmann.]]

Digital Library

[16]

P. Domingos. A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 564-569, Austin, TX, 2000c. AAAI Press.]]

Digital Library

[17]

T. Evgeniou, L. Perez-Breva, M. Pontil, and T. Poggio. Bounds on the Generalization Performance of Kernel Machine Ensembles. In P. Langley, editor, Proc. of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 271-278. Morgan Kaufmann, 2000.]]

Digital Library

[18]

Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148-156. Morgan Kauffman, 1996.]]

Digital Library

[19]

J. H. Friedman. On bias, variance, 0/1 loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1:55-77, 1997.]]

Digital Library

[20]

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma. Neural Computation, 4(1):1-58, 1992.]]

Digital Library

[21]

Y. Grandvalet and S. Canu. Adaptive Scaling for Feature Selection in SVMs. In S. Becker, S. Thrun, and K. Obermayer, editors, NIPS 2002 Conference Proceedings, Advances in Neural Information Processing Systems, volume 15, Cambridge, MA, 2003. MIT Press.]]

[22]

T. Heskes. Bias/Variance Decompostion for Likelihood-Based Estimators. Neural Computation, 10:1425-1433, 1998.]]

Digital Library

[23]

T. K. Ho and M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289-300, 2002.]]

Digital Library

[24]

Y. S. Huang and Suen. C. Y. Combination of multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17: 90-94, 1995.]]

Digital Library

[25]

G. James. Variance and bias for general loss function. Machine Learning, (2):115-135, 2003.]]

Digital Library

[26]

T. Joachims. Making large scale SVM learning practical. In Smola A. Scholkopf B., Burges C., editor, Advances in Kernel Methods - Support Vector Learning, pages 169-184. MIT Press, Cambridge, MA, 1999.]]

Digital Library

[27]

H. C. Kim, S. Pang, H. M. Je, D. Kim, and S. Y. Bang. Pattern Classification Using Support Vector Machine Ensemble. In Proceedings of the International Conference on Pattern Recognition, 2002, volume 2, pages 20160-20163. IEEE, 2002.]]

[28]

J. Kittler, M. Hatef, R. P. W. Duin, and Matas J. On combining classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(3):226-239, 1998.]]

Digital Library

[29]

E. M. Kleinberg. A Mathematically Rigorous Foundation for Supervised Learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 67-76. Springer-Verlag, 2000.]]

Digital Library

[30]

R. Kohavi and D. H. Wolpert. Bias plus variance decomposition for zero-one loss functions. In Proc. of the Thirteenth International Conference on Machine Learning, The Seventeenth International Conference on Machine Learning, pages 275-283, Bari, Italy, 1996. Morgan Kaufmann.]]

Digital Library

[31]

E. Kong and T. G. Dietterich. Error-correcting output coding correct bias and variance. In The XII International Conference on Machine Learning, pages 313-321, San Francisco, CA, 1995. Morgan Kauffman.]]

Digital Library

[32]

L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2):299-314, 2001a.]]

[33]

L. I. Kuncheva, F. Roli, G. L. Marcialis, and C. A. Shipp. Complexity of Data Subsets Generated by the Random Subspace Method: An Experimental Investigation. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349-358. Springer-Verlag, 2001b.]]

Digital Library

[34]

L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles. Machine Learning, 51:181-207, 2003.]]

Digital Library

[35]

M. Li and P Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, Berlin, 1993.]]

Digital Library

[36]

L. Mason, P. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 2000.]]

Digital Library

[37]

C. J. Merz and P. M. Murphy. UCI repository of machine learning databases, 1998. www.ics.uci.edu/mlearn/MLRepository.html.]]

[38]

A. Prodromidis, P. Chan, and S. Stolfo. Meta-Learning in Distributed Data Mining Systems: Issues and Approaches. In H. Kargupta and P. Chan, editors, Advances in Distributed Data Mining, pages 81-113. AAAI Press, 1999.]]

[39]

R. E. Schapire. A brief introduction to boosting. In Thomas Dean, editor, 16th International Joint Conference on Artificial Intelligence, pages 1401-1406. Morgan Kauffman, 1999.]]

Digital Library

[40]

R. E. Schapire, Y. Freund, P. Bartlett, and W. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651-1686, 1998.]]

[41]

B. Scholkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.]]

[42]

R. Tibshirani. Bias, variance and prediction error for classification rules. Technical report, Department of Preventive Medicine and Biostatistics and Department od Statistics, University of Toronto, Toronto, Canada, 1996.]]

[43]

G. Valentini and T. G. Dietterich. Low Bias Bagged Support Vector Machines. In T. Fawcett and N. Mishra, editors, Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), pages 752-759, Washington D. C., USA, 2003. AAAI Press.]]

[44]

G. Valentini and F. Masulli. NEURObjects: an object-oriented library for neural network development. Neurocomputing, 48(1-4):623-646, 2002.]]

[45]

G. Valentini, M. Muselli, and F. Ruffino. Bagged Ensembles of SVMs for Gene Expression Data Analysis. In IJCNN2003, The IEEE-INNS-ENNS International Joint Conference on Neural Networks , pages 1844-49, Portland, USA, 2003. IEEE.]]

[46]

V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.]]

Digital Library

[47]

D. Wang, J. M. Keller, C. A. Carson, K. K. McAdoo-Edwards, and C. W. Bailey. Use of fuzzy logic inspired features to improve bacterial recognition through classifier fusion. IEEE Transactions on Systems, Man and Cybernetics, 28B(4):583-591, 1998.]]

Digital Library

Cited By

Zhang LHou QLiu YBian JXu XZhou JZhu C(2024)Deep negative correlation classificationMachine Language10.1007/s10994-024-06604-0113:10(7223-7241)Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1007/s10994-024-06604-0
Piao CWang NYuan C(2023)Rebalance Weights AdaBoost-SVM Model for Imbalanced DataComputational Intelligence and Neuroscience10.1155/2023/48605362023Online publication date: 19-Jan-2023
https://dl.acm.org/doi/10.1155/2023/4860536
Zhang PGao HHu ZYang MSong DWang JHou YHu B(2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102747
Show More Cited By

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Wavelet twin support vector machines based on glowworm swarm optimization

Twin support vector machine is a machine learning algorithm developing from standard support vector machine. The performance of twin support vector machine is always better than support vector machine on datasets that have cross regions. Recently ...
Ensemble Approaches of Support Vector Machines for Multiclass Classification
Pattern Recognition and Machine Intelligence
Abstract
Support vector machine (SVM) which was originally designed for binary classification has achieved superior performance in various classification problems. In order to extend it to multiclass classification, one popular approach is to consider the ...
Empirical analysis of support vector machine ensemble classifiers

Ensemble classification - combining the results of a set of base learners - has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 5, Issue

12/1/2004

1571 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 December 2004

Published in JMLR Volume 5

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
1,452
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LHou QLiu YBian JXu XZhou JZhu C(2024)Deep negative correlation classificationMachine Language10.1007/s10994-024-06604-0113:10(7223-7241)Online publication date: 28-Aug-2024
https://dl.acm.org/doi/10.1007/s10994-024-06604-0
Piao CWang NYuan C(2023)Rebalance Weights AdaBoost-SVM Model for Imbalanced DataComputational Intelligence and Neuroscience10.1155/2023/48605362023Online publication date: 19-Jan-2023
https://dl.acm.org/doi/10.1155/2023/4860536
Zhang PGao HHu ZYang MSong DWang JHou YHu B(2022)A bias–variance evaluation framework for information retrieval systemsInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10274759:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102747
Zhou LFeng LGupta AOng Y(2021)Learnable Evolutionary Search Across Heterogeneous Problems via Kernelized AutoencodingIEEE Transactions on Evolutionary Computation10.1109/TEVC.2021.305651425:3(567-581)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1109/TEVC.2021.3056514
Singh AKaur M(2019)Intelligent content-based cybercrime detection in online social networks using cuckoo search metaheuristic approachThe Journal of Supercomputing10.1007/s11227-019-03113-z76:7(5402-5424)Online publication date: 16-Dec-2019
https://dl.acm.org/doi/10.1007/s11227-019-03113-z
Guan YPlötz T(2017)Ensembles of Deep LSTM Learners for Activity Recognition using WearablesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/30900761:2(1-28)Online publication date: 30-Jun-2017
https://dl.acm.org/doi/10.1145/3090076
Pal N(2017)A Primer on Cluster Analysis: 4 Basic Methods That (Usually) Work [Book Review]IEEE Computational Intelligence Magazine10.1109/MCI.2017.274287012:4(98-100)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1109/MCI.2017.2742870
Zhang LSuganthan P(2017)Benchmarking Ensemble Classifiers with Novel Co-Trained Kernal Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier]IEEE Computational Intelligence Magazine10.1109/MCI.2017.274286712:4(61-72)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1109/MCI.2017.2742867
Kang MKim YKim Y(2017)Collecting large training dataset of actual facial images from facebook for developing a weighted bagging gender classifierCluster Computing10.1007/s10586-017-0958-520:3(2157-2165)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10586-017-0958-5
Younsi RBagnall A(2016)Ensembles of random sphere cover classifiersPattern Recognition10.1016/j.patcog.2015.07.01049:C(213-225)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.patcog.2015.07.010
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents