Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

Published: 01 August 2010 Publication History

Abstract

Estimating the error rates of classifiers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data.

References

[1]
Y. Bishop, S. Fienberg, and P. Holland. Discrete Multivariate Analysis: Theory and Practice. MIT press, 1975.
[2]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proc. of ACL '07, 2007.
[3]
L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics department, University of California, 1996.
[4]
T.M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, second edition, 2005.
[5]
D. Cox, J. Little, and D. O'Shea. Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer, 2006.
[6]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley New York, 2001.
[7]
B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1997.
[8]
T. S. Ferguson. A Course in Large Sample Theory. Chapman & Hall, 1996.
[9]
D. J. Hand. Recent advances in error rate estimation. Pattern Recognition Letters, 4(5):335-346, 1986.
[10]
T. Joachims. Making large-scale svm learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.
[11]
K. Lang. Newsweeder: Learning to filter netnews. In International Conference on Machine Learning, 1995.
[12]
A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1984.
[13]
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proc. of the 14th ACM SIGKDD Internation Conference on Knowledge Discovery and Data Mining, pages 614-622, 2008.
[14]
P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi. Inferring ground truth from subjective labelling of venus images. In Advances in Neural Information Processing Systems 7, 1995.
[15]
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast-but is it good? evaluating non-expert annotations for natural language tasks. In Proc. of EMNLP, 2008.
[16]
B. Sturmfels. Solving Systems of Polynomial Equations. American Mathematical Society, 2002.
[17]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 2000.

Cited By

View all
  • (2024)AutoEval: Are Labels Always Necessary for Classifier Accuracy Evaluation?IEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.313624446:3(1868-1880)Online publication date: 1-Mar-2024
  • (2022)HAPIProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602054(24571-24585)Online publication date: 28-Nov-2022
  • (2022)Agreement-on-the-lineProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601671(19274-19289)Online publication date: 28-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 11, Issue
3/1/2010
3637 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010
Published in JMLR Volume 11

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)11
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AutoEval: Are Labels Always Necessary for Classifier Accuracy Evaluation?IEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.313624446:3(1868-1880)Online publication date: 1-Mar-2024
  • (2022)HAPIProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602054(24571-24585)Online publication date: 28-Nov-2022
  • (2022)Agreement-on-the-lineProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601671(19274-19289)Online publication date: 28-Nov-2022
  • (2020)Accuracy Estimation for an Incrementally Learning Cooperative Inventory Assistant RobotNeural Information Processing10.1007/978-3-030-63833-7_62(738-749)Online publication date: 18-Nov-2020
  • (2018)Actively constructing an effective training set by expected gain maximization criterionNeurocomputing10.1016/j.neucom.2015.01.065158:C(62-72)Online publication date: 31-Dec-2018
  • (2018)Classifier Risk Estimation Under Limited Labeling ResourcesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-93034-3_1(3-15)Online publication date: 3-Jun-2018
  • (2017)Label efficient learning by exploiting multi-class output codesProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298483.3298492(1735-1741)Online publication date: 4-Feb-2017
  • (2016)Unsupervised risk estimation using only conditional independence structureProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157507(3664-3672)Online publication date: 5-Dec-2016
  • (2016)Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instancesInformation Sciences: an International Journal10.1016/j.ins.2016.03.034355:C(127-151)Online publication date: 10-Aug-2016
  • (2016)Training query filtering for semi-supervised learning to rank with pseudo labelsWorld Wide Web10.1007/s11280-015-0363-z19:5(833-864)Online publication date: 1-Sep-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media