Article

The value of agreement, a new boosting algorithm

Author:

Boaz LeskesAuthors Info & Claims

COLT'05: Proceedings of the 18th annual conference on Learning Theory

Pages 95 - 110

https://doi.org/10.1007/11503415_7

Published: 27 June 2005 Publication History

Abstract

We present a new generalization bound where the use of unlabeled examples results in a better ratio between training-set size and the resulting classifier's quality and thus reduce the number of labeled examples necessary for achieving it. This is achieved by demanding from the algorithms generating the classifiers to agree on the unlabeled examples. The extent of this improvement depends on the diversity of the learners—a more diverse group of learners will result in a larger improvement whereas using two copies of a single algorithm gives no advantage at all. As a proof of concept, we apply the algorithm, named AgreementBoost, to a web classification problem where an up to 40% reduction in the number of labeled examples is obtained.

References

[1]

R.Meir and G. Rätsch: An Introduction to Boosting and Leveraging. Advanced lectures on machine learning. Pages: 118 - 183. ISBN:3-540-00529-3.

Digital Library

[2]

P.L. Bartlett and S. Mendelson: Rademacher and Gaussian Complexities: Risk bounds and Structural Results. The Journal of Machine Learning Research, Vol 3. 2003. Pages 463-482.

Digital Library

[3]

V. Koltchinskii and D. Panchenko: Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1), February 2002.

[4]

Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee: Boosting the margin: A new explanation for the effectiveness of voting methods. In Machine Learning: Proceedings of the Fourteenth Fourteenth International Conference 1997.

Digital Library

[5]

A. Blum and T. Mitchell: Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. ACM, 1998.

Digital Library

[6]

Anselm Blumer and A. Ehrenfeucht and David Haussler and Manfred K. Warmuth: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM Vol. 36, issue 4 1989.

Digital Library

[7]

K. Nigam, A. McCallum, S. Thrun, and T. Mitchell: Learning to classify text from labeled and unlabeled documents. In Proc. of the 5th National Conference on Artificial Intelligence. AAAI Press, 1998.

Digital Library

[8]

D. Luenberger: Introduction to Linear and Nonlinear Programming. Addison-Wesley publishing company. 1973. ISBN 0-201-04347-5.

[9]

T. Zhang and F. Oles, A probability analysis on the value of unlabeled data for classification problems. In Proc. of the Int. Conference on Machine Learning, 2000.

[10]

Seong-Bae Park and Byoung-Tak Zhang: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. In Information Processing and Management: an International Journal Vol. 40(3), 2004.

Digital Library

[11]

Kamal Nigam and Rayid Ghani: Analyzing the effectiveness and applicability of co-training. In Proc. of the 9th int. conference on Information and knowledge management 2000.

Digital Library

[12]

Sally Goldman and Yan Zhou: Enhancing supervised learning with unlabeled data. In International Joint Conference on Machine Learning, 2000.

Digital Library

[13]

R. Hwa, M. Osborne, A. Sarkar, M. Steedman: Corrected Co-training for Statistical Parsers. In the Proc. of the Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, International Conference of Machine Learning, Washington D.C., 2003.

[14]

D. Pierce and C. Cardie: Limitations of Co-Training for Natural Language Learning from Large Datasets. In Proc. of the Conference on Empirical Methods in Natural Language Processing 2001.

[15]

Michael Collins and Yoram Singer: Unsupervised models for named entity classification. In Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.

[16]

S. Dasgupta, Michael L. Littman, David A. McAllester: PAC Generalization Bounds for Cotraining. In Advances in Neural Information Processing Systems 14 (2001).

[17]

Yoav Freund and Robert E. Schapire: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 1997.

Digital Library

[18]

L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Boosting algorithms as gradient descent in function space. Technical report, RSISE, Australian National University 1999.

[19]

K. P. Bennett and A. Demiriz and R. Maclin: Exploiting unlabeled data in ensemble methods. In Proceedings of the eighth ACM SIGKDD int. conference on Knowledge discovery and data mining, 2002.

Digital Library

[20]

G. Rätsch, S. Mika, and M.K. Warmuth. On the convergence of leveraging. NeuroCOLT2 Technical Report 98, Royal Holloway College, London, August 2001.

[21]

A. Levin, P. Viola and Y. Freund: Unsupervised Improvement of Visual Detectors using Co-Training. Int. Conference on Computer Vision (ICCV), Oct 2003, Nice, France.

Digital Library

Cited By

Sun XXu YCao PKong YHu LZhang SWang Y(2020)TCGM: An Information-Theoretic Framework for Semi-supervised Multi-modality LearningComputer Vision – ECCV 202010.1007/978-3-030-58580-8_11(171-188)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58580-8_11
Maximov YAmini MHarchaoui Z(2018)Rademacher complexity bounds for a penalized multiclass semi-supervised algorithm (extended abstract)Proceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304815(5637-5641)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304815
Du YLi QCai ZGuan X(2013)Multi-view semi-supervised web image classification via co-graphNeurocomputing10.1016/j.neucom.2013.06.007122(430-440)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.neucom.2013.06.007
Show More Cited By

The value of agreement, a new boosting algorithm
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

The value of agreement a new boosting algorithm

In the past few years unlabeled examples and their potential advantage have received a lot of attention. In this paper a new boosting algorithm is presented where unlabeled examples are used to enforce agreement between several different learning ...
Boosting feature selection
ICAPR'05: Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

It is possible to reduce the error rate of a single classifier using a classifier ensemble. However, any gain in performance is undermined by the increased computation of performing classification several times. Here the Adaboost_FS algorithm is proposed ...
The distributed boosting algorithm
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneous databases that cannot be merged at a single location. Our distributed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

COLT'05: Proceedings of the 18th annual conference on Learning Theory

June 2005

690 pages

ISBN:3540265562

Editors:
Peter Auer
University of Leoben, Leoben, Austria
,
Ron Meir
Department of Electrical Engineering, Technion, Haifa, Israel

Sponsors

Pascal
Google Inc.
Machine Learning Journal/Springer
BiCi

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 June 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun XXu YCao PKong YHu LZhang SWang Y(2020)TCGM: An Information-Theoretic Framework for Semi-supervised Multi-modality LearningComputer Vision – ECCV 202010.1007/978-3-030-58580-8_11(171-188)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58580-8_11
Maximov YAmini MHarchaoui Z(2018)Rademacher complexity bounds for a penalized multiclass semi-supervised algorithm (extended abstract)Proceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304815(5637-5641)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304815
Du YLi QCai ZGuan X(2013)Multi-view semi-supervised web image classification via co-graphNeurocomputing10.1016/j.neucom.2013.06.007122(430-440)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.neucom.2013.06.007
Balcan MBlum A(2010)A discriminative model for semi-supervised learningJournal of the ACM10.1145/1706591.170659957:3(1-46)Online publication date: 29-Mar-2010
https://dl.acm.org/doi/10.1145/1706591.1706599
Amini MUsunier NGoutte C(2009)Learning from multiple partially observed views -an application to multilingual text categorizationProceedings of the 23rd International Conference on Neural Information Processing Systems10.5555/2984093.2984097(28-36)Online publication date: 7-Dec-2009
https://dl.acm.org/doi/10.5555/2984093.2984097
Chen KWang S(2007)Regularized boost for semi-supervised learningProceedings of the 21st International Conference on Neural Information Processing Systems10.5555/2981562.2981598(281-288)Online publication date: 3-Dec-2007
https://dl.acm.org/doi/10.5555/2981562.2981598
Brefeld UGärtner TScheffer TWrobel SCohen WMoore A(2006)Efficient co-regularised least squares regressionProceedings of the 23rd international conference on Machine learning10.1145/1143844.1143862(137-144)Online publication date: 25-Jun-2006
https://dl.acm.org/doi/10.1145/1143844.1143862

View Options

View options

Figures

Tables

Media

View Table of Conten