Article

A PAC-Style model for learning from labeled and unlabeled data

Authors:

Maria-Florina Balcan,

Avrim BlumAuthors Info & Claims

COLT'05: Proceedings of the 18th annual conference on Learning Theory

Pages 111 - 126

https://doi.org/10.1007/11503415_8

Published: 27 June 2005 Publication History

Abstract

There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of different approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution. In this view, unlabeled data can be helpful because it allows one to estimate compatibility over the space of hypotheses, and reduce the size of the search space to those that, according to one's assumptions, are a-priori reasonable with respect to the distribution. We discuss a number of technical issues that arise in this context, and provide sample-complexity bounds both for uniform convergence and ε-cover based algorithms. We also consider algorithmic issues, and give an efficient algorithm for a special case of co-training.

References

[1]

M. F. Balcan, A. Blum, and K. Yang. Co-training and expansion: Towards bridging theory and practice. In NIPS, 2004.

Digital Library

[2]

G.M. Benedek and A. Itai. Learnability with respect to a fixed distribution. Theoretical Computer Science, 86:377-389, 1991.

Digital Library

[3]

A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proc. ICML, pages 19-26, 2001.

Digital Library

[4]

A. Blum, A. Frieze, R. Kannan, and S. Vempala. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica, 22:35-52, 1998.

[5]

A. Blum and T. M. Mitchell. Combining labeled and unlabeled data with cotraining. In Proc. 11th Annual Conf. Computational Learning Theory, pages 92- 100, 1998.

Digital Library

[6]

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik Chervonenkis dimension. Journal of the ACM, 36(4):929-965, 1989.

Digital Library

[7]

S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of recent advances. Manuscript, 2004.

[8]

S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16:277-292, 2000.

Digital Library

[9]

V. Castelli and T.M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16:105-111, 1995.

Digital Library

[10]

V. Castelli and T.M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory, 42(6):2102-2117, 1996.

Digital Library

[11]

L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.

[12]

J. Dunagan and S. Vempala. Optimal outlier removal in high-dimensional spaces. In Proceedings of the 33rd ACM Symposium on Theory of Computing, 2001.

Digital Library

[13]

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Inf. and Comput, 82:246-261, 1989.

Digital Library

[14]

A. Flaxman. Personal communication, 2003.

[15]

R. Hwa, M. Osborne, A. Sarkar, and M. Steedman. Corrected co-training for statistical parsers. In ICML-03 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, Washington D.C., 2003.

[16]

T. Joachims. Transductive inference for text classification using support vector machines. In Proc. ICML, pages 200-209, 1999.

Digital Library

[17]

A. Levin, P. Viola, and Y. Freund. Unsupervised improvement of visual detectors using co-training. In Proc. 9th Int. Conf. Computer Vision, pages 626-633, 2003.

Digital Library

[18]

K. Nigam, A. McCallum, S. Thrun, and T.M. Mitchell. Text classification from labeled and unlabeled documents using EM. Mach. Learning, 39(2/3):103-134, 2000.

Digital Library

[19]

S.-B. Park and B.-T. Zhang. Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing and Management, 40(3):421 - 439, 2004.

Digital Library

[20]

J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926-1940, 1998.

Digital Library

[21]

L.G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134-1142, 1984.

Digital Library

[22]

V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons Inc., 1998.

Digital Library

[23]

D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Meeting of the Association for Computational Linguistics, pages 189-196, 1995.

Digital Library

[24]

X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. ICML, pages 912-912, 2003.

Cited By

Pukdee RSam DKolter JBalcan MRavikumar POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668294
Wang KHe HNguyen TKumar PRoth DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)On regularization and inference with label constraintsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619893(35740-35762)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619893
Chen JShah VKyrillidis ADaumé HSingh A(2020)Negative sampling in semi-supervised learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525097(1704-1714)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525097
Show More Cited By

A PAC-Style model for learning from labeled and unlabeled data
1. Computing methodologies
2. Theory of computation

Recommendations

Learning from labeled and unlabeled data: an empirical study across techniques and domains

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data -- i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different ...
Learning Instance Weighted Naive Bayes from labeled and unlabeled data

In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data ...
Learning classifier system using both labeled and unlabeled data
GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation

In this paper, we propose a Semi-UCS, which is an extension of the classical sUpervised Classifier System (UCS) [1] for semi-supervised learning tasks. A UCS works under a supervised learning scheme and uses only labeled data to train the system. In the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

COLT'05: Proceedings of the 18th annual conference on Learning Theory

June 2005

690 pages

ISBN:3540265562

Editors:
Peter Auer
University of Leoben, Leoben, Austria
,
Ron Meir
Department of Electrical Engineering, Technion, Haifa, Israel

Sponsors

Pascal
Google Inc.
Machine Learning Journal/Springer
BiCi

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 June 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pukdee RSam DKolter JBalcan MRavikumar POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668294
Wang KHe HNguyen TKumar PRoth DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)On regularization and inference with label constraintsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619893(35740-35762)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619893
Chen JShah VKyrillidis ADaumé HSingh A(2020)Negative sampling in semi-supervised learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525097(1704-1714)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525097
Zhang ZZhao PJiang YZhou ZTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Learning from Incomplete and Inaccurate SupervisionProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330902(1017-1025)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330902
Mitchell TCohen WHruschka ETalukdar PYang BBetteridge JCarlson ADalvi BGardner MKisiel BKrishnamurthy JLao NMazaitis KMohamed TNakashole NPlatanios ERitter ASamadi MSettles BWang RWijaya DGupta AChen XSaparov AGreaves MWelling J(2018)Never-ending learningCommunications of the ACM10.1145/319151361:5(103-115)Online publication date: 24-Apr-2018
https://dl.acm.org/doi/10.1145/3191513
Fraser M(2016)Multi-step learning and underlying structure in statistical modelsProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157635(4815-4823)Online publication date: 5-Dec-2016
https://dl.acm.org/doi/10.5555/3157382.3157635
Lan CHuan JCao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View LearningProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783409(627-634)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2783409
Bourgain JDirksen SNelson JServedio RRubinfeld R(2015)Toward a Unified Theory of Sparse Dimensionality Reduction in Euclidean SpaceProceedings of the forty-seventh annual ACM symposium on Theory of Computing10.1145/2746539.2746541(499-508)Online publication date: 14-Jun-2015
https://dl.acm.org/doi/10.1145/2746539.2746541
Zhang JHuan JYang QAgarwal DPei J(2012)Inductive multi-task learning with multiple view dataProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339617(543-551)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2339530.2339617
Daumé HKumar ASaha A(2010)Co-regularization based semi-supervised domain adaptationProceedings of the 24th International Conference on Neural Information Processing Systems - Volume 110.5555/2997189.2997243(478-486)Online publication date: 6-Dec-2010
https://dl.acm.org/doi/10.5555/2997189.2997243
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten