Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Unbiased generative semi-supervised learning

Published: 01 January 2014 Publication History

Abstract

Reliable semi-supervised learning, where a small amount of labelled data is complemented by a large body of unlabelled data, has been a long-standing goal of the machine learning community. However, while it seems intuitively obvious that unlabelled data can aid the learning process, in practise its performance has often been disappointing. We investigate this by examining generative maximum likelihood semi-supervised learning and derive novel upper and lower bounds on the degree of bias introduced by the unlabelled data. These bounds improve upon those provided in previous work, and are specifically applicable to the challenging case where the model is unable to exactly fit to the underlying distribution a situation which is common in practise, but for which fewer guarantees of semi-supervised performance have been found. Inspired by this new framework for analysing bounds, we propose a new, simple reweighing scheme which provides a provably unbiased estimator for arbitrary model/distribution pairs--an unusual property for a semi-supervised algorithm. This reweighing introduces no additional computational complexity and can be applied to very many models. Additionally, we provide specific conditions demonstrating the circumstance under which the unlabelled data will lower the estimator variance, thereby improving convergence.

References

[1]
M. Balcan and A. Blum. An augmented PAC model for semi-supervised learning. In Oliver Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning, pages 383-404. MIT Press, 2005.
[2]
C. Beecks, A. Ivanescu, S. Kirchhoff, and T. Seidl. Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In Computer Vision (ICCV), 2011 IEEE International Conference On, pages 1754-1761, 2011.
[3]
C. Bishop. Pattern Recognition And Machine Learning. Springer, 2006.
[4]
A. Blum and N. Balcan. A discriminative model for semi-supervised learning. In Journal Of The ACM (JACM), volume 57, pages 19:1-19:46, 2010.
[5]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings Of The Workshop On Computational Learning Theory, pages 92-100, 1998.
[6]
V. Castelli and T. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105-111, 1995.
[7]
V. Castelli and T. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. Information Theory, IEEE Transactions on, 42(6):2102-2117, 1996.
[8]
F. Cozman and I. Cohen. Unlabeled data can degrade classification performance of generative classifiers. In Fifteenth International Florida Artificial Intelligence Society Conference, pages 327-331, 2002.
[9]
F. Cozman and I. Cohen. Risks of semi-supervised learning: How unlabelled data can degrade performance of generative classifiers. In O. Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning, pages 57-72. MIT press, 2006.
[10]
F. Cozman, I. Cohen, M. Cirelo, and E. Politécnica. Semi-supervised learning of mixture models. In Proceedings Of The 20th International Conference On Machine Learning (ICML), pages 99-106, 2003.
[11]
J. Dillon, K. Balasubramanian, and G. Lebanon. Asymptotic analysis of generative semisupervised learning. In Proceedings Of The 27th International Conference On Machine Learning (ICML), 2010.
[12]
G. Druck, C. Pal, A. McCallum, and X. Zhu. Semi-supervised classification with hybrid generative/discriminative methods. In KDD '07: Proceedings Of The 13th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pages 280-289, 2007.
[13]
A. Frank and A. Asuncion. UCI machine learning repository, 2010. URL http://archive. ics.uci.edu/ml.
[14]
Y. Grandvalet and Y. Bengio. Entropy Regularization. In Semi-Supervised Learning, pages 151-168. MIT Press, 2006.
[15]
T. Ho and E. Kleinberg. Building projectable classifiers of arbitrary complexity. In International Conference On Pattern Recognition (ICPR), volume 2, pages 880-885, 1996.
[16]
C. Hsu, C. Chang, and C. Lin. A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 2003.
[17]
G. Hughes. On the mean accuracy of statistical pattern recognizers. Information Theory, IEEE Transactions On, 14(1):55-63, 1968.
[18]
T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Proceedings Of The 1998 Conference On Advances In Neural Information Processing Systems II, pages 487-493, 1999.
[19]
E. Jaynes. Probability Theory. Cambridge University Press, 2003.
[20]
H. Kang, S. Yoo, and D. Han. Senti-lexicon and improved naïve bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems With Applications, 39(5):6000-6010, 2012.
[21]
B. Krishnapuram, D. Williams, Y. Xue, A. Hartemink, L. Carin, and M. Figueiredo. On semi-supervised classification. In Advances In Neural Information Processing Systems (NIPS), pages 721-728, 2005.
[22]
J. Lagarias, J. Reeds, M. Wright, and P. Wright. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM Journal Of Optimization, 9(1):112-147, 1998.
[23]
J. Lasserre, C. Bishop, and T. Minka. Principled hybrids of generative and discriminative models. In Computer Vision And Pattern Recognition (CVPR), volume 1, pages 87-94, 2006.
[24]
J. Lücke and J. Eggert. Expectation truncation and the benefits of preselection in training generative models. In Journal Of Machine Learning Research (JMLR), pages 2855-2900, October 2010.
[25]
D. MacKay. Information Theory, Inference, And Learning Algorithms. Cambridge University Press, 2003.
[26]
G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via Expectation Regularization. In Proceedings Of The 24th International Conference On Machine Learning (ICML), pages 593-600, 2007.
[27]
A. McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/ discriminative training for clustering and classification. In National Conference On Artificial Intelligence, pages 433-439, 2006.
[28]
A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes. Advances In Neural Information Processing Systems (NIPS), 2:841-848, 2002.
[29]
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103-134, 2000.
[30]
J. Nocedal and S. Wright. Numerical Optimization, Springer Series In Operations Research. Springer-Verlag, 1999.
[31]
J. Ratsaby and S. Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Conference On Learning Theory (COLT), pages 412-417, 1995.
[32]
I. Rauschert and R. Collins. A generative model for simultaneous estimation of human body shape and pixel-level segmentation. In Proceedings Of The 12th European Conference On Computer Vision - Volume Part V, European Conference On Computer Vision (ECCV), pages 704-717. Springer-Verlag, 2012.
[33]
S. Rosset, J. Zhu, H. Zou, and T. Hastie. A method for inferring label sampling mechanisms in semi-supervised learning. In Advances In Neural Information Processing Systems (NIPS), volume 17, pages 1161-1168, 2005.
[34]
B. Shahshahani and D. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. Geoscience And Remote Sensing, IEEE Transactions on, 32(5):1087-1095, 1994.
[35]
A. Subramanya and J. Bilmes. Soft-supervised learning for text classification. In Proceedings Of The Conference On Empirical Methods In Natural Language Processing, pages 1090-1099, 2008.
[36]
A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. In Advances In Neural Information Processing Systems (NIPS), December 2009.
[37]
U. Syed and B. Taskar. Semi-supervised learning with adversarially missing label information. In Advances In Neural Information Processing Systems (NIPS), pages 2244-2252, 2010.
[38]
M. Szummer and T. Jaakkola. Information Regularization with partially labeled data. In Advances In Neural Information Processing Systems (NIPS), 2002.
[39]
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc, 1998.
[40]
J. Wang, X. Shen, and W. Pan. On transductive support vector machines. In Prediction And Discovery. American Mathematical Society, 2007.
[41]
T. Yang and C. Priebe. The effect of model misspecification on semi-supervised classification. Pattern Analysis And Machine Intelligence (PAMI), 33:2093-2103, 2011.
[42]
I. Yeh, K. Yang, and T. Ting. Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems With Applications, 36(3, Part 2):5866-5871, 2009.
[43]
T. Zhang. The value of unlabeled data for classification problems. In International Conference On Machine Learning (ICML), pages 1191-1198, 2000.
[44]
X. Zhu. Semi-supervised learning literature survey. Technical report, Department of Computer Sciences, University of Wisconsin, Madison, 2005.
[45]
F. Zhuang, P. Luo, Z. Shen, Q. He, Y. Xiong, Z. Shi, and H. Xiong. Mining distinction and commonality across multiple domains using generative model for text classification. Knowledge And Data Engineering, IEEE Transactions On, 24(11):2025-2039, 2012.

Cited By

View all
  • (2022)Robust and sparse label propagation for graph-based semi-supervised classificationApplied Intelligence10.1007/s10489-021-02360-z52:3(3337-3351)Online publication date: 1-Feb-2022
  • (2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
  • (2016)A comparative study of minimally supervised morphological segmentationComputational Linguistics10.1162/COLI_a_0024342:1(91-120)Online publication date: 1-Mar-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 15, Issue 1
January 2014
4085 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2014
Revised: 01 September 2013
Published in JMLR Volume 15, Issue 1

Author Tags

  1. Kullback-Leibler
  2. asymptotic bounds
  3. bias
  4. generative model
  5. semi-supervised

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)9
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Robust and sparse label propagation for graph-based semi-supervised classificationApplied Intelligence10.1007/s10489-021-02360-z52:3(3337-3351)Online publication date: 1-Feb-2022
  • (2019)Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text ClassificationThe World Wide Web Conference10.1145/3308558.3313658(3370-3376)Online publication date: 13-May-2019
  • (2016)A comparative study of minimally supervised morphological segmentationComputational Linguistics10.1162/COLI_a_0024342:1(91-120)Online publication date: 1-Mar-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media