article

Free access

Unbiased generative semi-supervised learning

Editors: Kevin Murphy, Bernhard Schölkopf Authors:

Patrick Fox-Roberts,

Edward RostenAuthors Info & Claims

The Journal of Machine Learning Research, Volume 15, Issue 1

Pages 367 - 443

Published: 01 January 2014 Publication History

PDF eReader Publisher Site

Abstract

Reliable semi-supervised learning, where a small amount of labelled data is complemented by a large body of unlabelled data, has been a long-standing goal of the machine learning community. However, while it seems intuitively obvious that unlabelled data can aid the learning process, in practise its performance has often been disappointing. We investigate this by examining generative maximum likelihood semi-supervised learning and derive novel upper and lower bounds on the degree of bias introduced by the unlabelled data. These bounds improve upon those provided in previous work, and are specifically applicable to the challenging case where the model is unable to exactly fit to the underlying distribution a situation which is common in practise, but for which fewer guarantees of semi-supervised performance have been found. Inspired by this new framework for analysing bounds, we propose a new, simple reweighing scheme which provides a provably unbiased estimator for arbitrary model/distribution pairs--an unusual property for a semi-supervised algorithm. This reweighing introduces no additional computational complexity and can be applied to very many models. Additionally, we provide specific conditions demonstrating the circumstance under which the unlabelled data will lower the estimator variance, thereby improving convergence.

References

[1]

M. Balcan and A. Blum. An augmented PAC model for semi-supervised learning. In Oliver Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning, pages 383-404. MIT Press, 2005.

[2]

C. Beecks, A. Ivanescu, S. Kirchhoff, and T. Seidl. Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In Computer Vision (ICCV), 2011 IEEE International Conference On, pages 1754-1761, 2011.

Digital Library

[3]

C. Bishop. Pattern Recognition And Machine Learning. Springer, 2006.

Digital Library

[4]

A. Blum and N. Balcan. A discriminative model for semi-supervised learning. In Journal Of The ACM (JACM), volume 57, pages 19:1-19:46, 2010.

Digital Library

[5]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings Of The Workshop On Computational Learning Theory, pages 92-100, 1998.

Digital Library

[6]

V. Castelli and T. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105-111, 1995.

Digital Library

[7]

V. Castelli and T. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. Information Theory, IEEE Transactions on, 42(6):2102-2117, 1996.

[8]

F. Cozman and I. Cohen. Unlabeled data can degrade classification performance of generative classifiers. In Fifteenth International Florida Artificial Intelligence Society Conference, pages 327-331, 2002.

Digital Library

[9]

F. Cozman and I. Cohen. Risks of semi-supervised learning: How unlabelled data can degrade performance of generative classifiers. In O. Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning, pages 57-72. MIT press, 2006.

[10]

F. Cozman, I. Cohen, M. Cirelo, and E. Politécnica. Semi-supervised learning of mixture models. In Proceedings Of The 20th International Conference On Machine Learning (ICML), pages 99-106, 2003.

[11]

J. Dillon, K. Balasubramanian, and G. Lebanon. Asymptotic analysis of generative semisupervised learning. In Proceedings Of The 27th International Conference On Machine Learning (ICML), 2010.

Digital Library

[12]

G. Druck, C. Pal, A. McCallum, and X. Zhu. Semi-supervised classification with hybrid generative/discriminative methods. In KDD '07: Proceedings Of The 13th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pages 280-289, 2007.

Digital Library

[13]

A. Frank and A. Asuncion. UCI machine learning repository, 2010. URL http://archive. ics.uci.edu/ml.

[14]

Y. Grandvalet and Y. Bengio. Entropy Regularization. In Semi-Supervised Learning, pages 151-168. MIT Press, 2006.

[15]

T. Ho and E. Kleinberg. Building projectable classifiers of arbitrary complexity. In International Conference On Pattern Recognition (ICPR), volume 2, pages 880-885, 1996.

Digital Library

[16]

C. Hsu, C. Chang, and C. Lin. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 2003.

[17]

G. Hughes. On the mean accuracy of statistical pattern recognizers. Information Theory, IEEE Transactions On, 14(1):55-63, 1968.

Digital Library

[18]

T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Proceedings Of The 1998 Conference On Advances In Neural Information Processing Systems II, pages 487-493, 1999.

Digital Library

[19]

E. Jaynes. Probability Theory. Cambridge University Press, 2003.

[20]

H. Kang, S. Yoo, and D. Han. Senti-lexicon and improved naïve bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems With Applications, 39(5):6000-6010, 2012.

Digital Library

[21]

B. Krishnapuram, D. Williams, Y. Xue, A. Hartemink, L. Carin, and M. Figueiredo. On semi-supervised classification. In Advances In Neural Information Processing Systems (NIPS), pages 721-728, 2005.

[22]

J. Lagarias, J. Reeds, M. Wright, and P. Wright. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM Journal Of Optimization, 9(1):112-147, 1998.

Digital Library

[23]

J. Lasserre, C. Bishop, and T. Minka. Principled hybrids of generative and discriminative models. In Computer Vision And Pattern Recognition (CVPR), volume 1, pages 87-94, 2006.

Digital Library

[24]

J. Lücke and J. Eggert. Expectation truncation and the benefits of preselection in training generative models. In Journal Of Machine Learning Research (JMLR), pages 2855-2900, October 2010.

Digital Library

[25]

D. MacKay. Information Theory, Inference, And Learning Algorithms. Cambridge University Press, 2003.

Digital Library

[26]

G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via Expectation Regularization. In Proceedings Of The 24th International Conference On Machine Learning (ICML), pages 593-600, 2007.

Digital Library

[27]

A. McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/ discriminative training for clustering and classification. In National Conference On Artificial Intelligence, pages 433-439, 2006.

Digital Library

[28]

A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes. Advances In Neural Information Processing Systems (NIPS), 2:841-848, 2002.

[29]

K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103-134, 2000.

Digital Library

[30]

J. Nocedal and S. Wright. Numerical Optimization, Springer Series In Operations Research. Springer-Verlag, 1999.

[31]

J. Ratsaby and S. Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Conference On Learning Theory (COLT), pages 412-417, 1995.

Digital Library

[32]

I. Rauschert and R. Collins. A generative model for simultaneous estimation of human body shape and pixel-level segmentation. In Proceedings Of The 12th European Conference On Computer Vision - Volume Part V, European Conference On Computer Vision (ECCV), pages 704-717. Springer-Verlag, 2012.

Digital Library

[33]

S. Rosset, J. Zhu, H. Zou, and T. Hastie. A method for inferring label sampling mechanisms in semi-supervised learning. In Advances In Neural Information Processing Systems (NIPS), volume 17, pages 1161-1168, 2005.

[34]

B. Shahshahani and D. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. Geoscience And Remote Sensing, IEEE Transactions on, 32(5):1087-1095, 1994.

[35]

A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In Proceedings Of The Conference On Empirical Methods In Natural Language Processing, pages 1090-1099, 2008.

Digital Library

[36]

A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. In Advances In Neural Information Processing Systems (NIPS), December 2009.

[37]

U. Syed and B. Taskar. Semi-supervised learning with adversarially missing label information. In Advances In Neural Information Processing Systems (NIPS), pages 2244-2252, 2010.

[38]

M. Szummer and T. Jaakkola. Information Regularization with partially labeled data. In Advances In Neural Information Processing Systems (NIPS), 2002.

[39]

V. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc, 1998.

Digital Library

[40]

J. Wang, X. Shen, and W. Pan. On transductive support vector machines. In Prediction And Discovery. American Mathematical Society, 2007.

[41]

T. Yang and C. Priebe. The effect of model misspecification on semi-supervised classification. Pattern Analysis And Machine Intelligence (PAMI), 33:2093-2103, 2011.

Digital Library

[42]

I. Yeh, K. Yang, and T. Ting. Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems With Applications, 36(3, Part 2):5866-5871, 2009.

Digital Library

[43]

T. Zhang. The value of unlabeled data for classification problems. In International Conference On Machine Learning (ICML), pages 1191-1198, 2000.

Digital Library

[44]

X. Zhu. Semi-supervised learning literature survey. Technical report, Department of Computer Sciences, University of Wisconsin, Madison, 2005.

[45]

F. Zhuang, P. Luo, Z. Shen, Q. He, Y. Xiong, Z. Shi, and H. Xiong. Mining distinction and commonality across multiple domains using generative model for text classification. Knowledge And Data Engineering, IEEE Transactions On, 24(11):2025-2039, 2012.

Digital Library

Cited By

Hua ZYang Y(2022)Robust and sparse label propagation for graph-based semi-supervised classificationApplied Intelligence10.1007/s10489-021-02360-z52:3(3337-3351)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10489-021-02360-z
Xiao HLiu XSong Y(2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313658
Ruokolainen TKohonen OSirts KGrönroos SKurimo MVirpioja S(2016)A comparative study of minimally supervised morphological segmentationComputational Linguistics10.1162/COLI_a_0024342:1(91-120)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1162/COLI_a_00243

Unbiased generative semi-supervised learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

A Survey of Semi-Supervised Learning Methods
CIS '08: Proceedings of the 2008 International Conference on Computational Intelligence and Security - Volume 02

In traditional machine learning approaches to classification, one uses only a labelled set to train the classifier. Labelled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human ...
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
Computer Vision – ECCV 2022
Abstract
To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data. We show that these two approaches, at the two ...
Multi-Label Regularized Generative Model for Semi-Supervised Collective Classification in Large-Scale Networks

The problem of collective classification (CC) for large-scale network data has received considerable attention in the last decade. Enabling CC usually increases accuracy when given a fully-labeled network with a large amount of labeled data. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 15, Issue 1

January 2014

4085 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2014

Revised: 01 September 2013

Published in JMLR Volume 15, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)9

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hua ZYang Y(2022)Robust and sparse label propagation for graph-based semi-supervised classificationApplied Intelligence10.1007/s10489-021-02360-z52:3(3337-3351)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s10489-021-02360-z
Xiao HLiu XSong Y(2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313658
Ruokolainen TKohonen OSirts KGrönroos SKurimo MVirpioja S(2016)A comparative study of minimally supervised morphological segmentationComputational Linguistics10.1162/COLI_a_0024342:1(91-120)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1162/COLI_a_00243

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents