article

Evaluating the Predictive Performance of Positive- Unlabelled Classifiers: a brief critical review and practical recommendations for improvement

Authors:

Jack D. Saunders,

Alex A. FreitasAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 24, Issue 2

Pages 5 - 11

https://doi.org/10.1145/3575637.3575642

Published: 08 December 2022 Publication History

Abstract

Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances. Whilst much work has been done proposing methods for PU learning, little has been written on the subject of evaluating these methods. Many popular standard classification metrics cannot be precisely calculated due to the absence of fully labelled data, so alternative approaches must be taken. This short commentary paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers and provides practical recommendations for improvements in this area.

References

[1]

Bekker, J. and Davis, J., 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), pp.719--760.

Digital Library

[2]

Elkan, C. and Noto, K., 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213--220.

[3]

Nikdelfaz, O. and Jalili, S., 2018. Disease genes prediction by HMM based PU-learning using gene expression profiles. Journal of Biomedical Informatics, 81, pp.102--111.

[4]

Vasighizaker, A. and Jalili, S. 2018. C-PUGP: A cluster-based positive unlabelled learning method for disease gene prediction and prioritisation. Computational Biology and Chemistry, 76, pp. 23--31.

[5]

Yang, P., Li, X., Mei, K., et al. 2012. Positive-unlabelled learning for disease gene identification. Bioinformatics, 28(20), pp. 2640--2647.

Digital Library

[6]

Liu, L. and Peng, T., 2014. Clustering-based Method for Positive and Unlabelled Text Categorization Enhanced by Improved TFIDF. Journal of Information Science and Engineering, 30, pp. 1463--1481.

[7]

Ke, T., Yang, B., Zhen, L., et al. 2012. Building highperformance classifiers using positive and unlabelled examples for text. International Symposium on Neural Networks, pp. 187--195.

[8]

Liu, B., Yu, P., and Li, X. 2002. Partially supervised classification of text documents. International Conference on Machine Learning, 2(485), pp. 387--394.

[9]

Zhang, Y., Li, L., Zhou, J., et al. 2017. Poster: A PU learning based system for potential malicious URL detection. Proceedings of the ACM Conference on Computer and Communications Security, pp. 2599--2601.

Digital Library

[10]

Luo, Y., Cheng, S., Liu, C., et al. 2018. PU learning in payload-based web anomaly detection. Proceedings of the Third International Conference on Security of Smart Cities, Industrial Control System and Communications, pp. 1--5.

[11]

Van Engelen, J.E. and Hoos, H.H., 2020. A survey on semisupervised learning. Machine Learning, 109(2), pp.373- 440.

[12]

Jaskie, K. and Spanias, A., 2019. Positive and unlabeled learning algorithms and applications: A survey. In Proceedings of the 10th International Conference on Information, Intelligence, Systems and Applications (pp. 1- 8).

[13]

Li, G., 2013. A survey on positive and unlabelled learning. Computer & Information Sciences.

[14]

Japkowicz, N. and Shah, M., 2011. Evaluating Learning Algorithms: a classification perspective. Cambridge University Press, 2011.

[15]

Bekker, J. and Davis, J., 2018. Estimating the class prior in positive and unlabeled data through decision tree induction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 32(1), pp.2712--2719.

[16]

Du Plessis, M.C. and Sugiyama, M., 2014. Class prior estimation from positive and unlabeled data. IEICE TRANSACTIONS on Information and Systems, 97(5), pp.1358--1362.

[17]

Nguyen, M.N., Li, X.L. and Ng, S.K., 2011. Positive unlabeled learning for time series classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2, pp.1421--1426.

[18]

Zhou, K., Gui-Rong, X., Yang, Q., et al. 2010. Learning with positive and unlabelled examples using topic-sensitive PLSA. IEEE Transactions on Knowledge and Data Engineering, 22(1), pp. 46--58.

Digital Library

[19]

Basile, T., Di Mauro, N., Esposito, F., et al. 2018. Density estimators for positive-unlabelled learning. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns, pp.49--64.

[20]

Bekker, J., and Davis, J., 2017. Positive and unlabelled relational classification through label frequency estimation. In Proceedings of the International Conference on Inductive Logic Programming, pp.16--30.

[21]

Calvo, B., Larrañaga, P., and Lozano, J., 2007. Learning Bayesian classifiers from positive and unlabelled examples. Pattern Recognition Letters, 28(16), pp.2375--2384.

Digital Library

[22]

Chaudhari, S., and Shevade, S., 2012. Learning from positive and unlabelled examples using maximum margin clustering. In Proceedings of the International Conference on Neural Information Processing, pp.465--473.

[23]

Chiaroni, F., Rahal, M., Hueber, N., et al. 2018. Learning with a generative adversarial network from a positive unlabeled dataset for image classification. In Proceedings of the 25th IEEE International Conference on Image Processing, pp.1368--1372.

[24]

Claesen, M., De Smet, F., Suykens, J.A. and De Moor, B., 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing, 160, pp.73--84.

Digital Library

[25]

Denis, F., Gilleron, R., and Letouzey, F., 2005. Learning from positive and unlabeled examples. Theoretical Computer Science, pp.70--83.

[26]

Fung, C., Yu, J., Lu, H., et al. 2006. Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1), pp.6--20.

Digital Library

[27]

Gan, H., Zhang, Y., and Song, Q., 2017. Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recognition Letters, 90, pp.28--35.

Digital Library

[28]

He, F., Liu, T., Webb, G.I. and Tao, D., 2018. Instancedependent PU learning by Bayesian optimal relabeling. arXiv preprint arXiv:1808.02180.

[29]

He. J., Zhang, Y., Li, X., et al. 2010. Naïve Bayes classifier for positive unlabeled learning with uncertainty. In Proceedings of the 2010 SIAM International Conference on Data Mining, pp.361--372.

[30]

Hou, M., Chaib-draa, B., Li, C., et al. 2018. Generative adversarial positive-unlabeled learning. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp.2255--2261.

[31]

Ienco, D., and Pensa, R., 2016. Positive and unlabeled learning in categorical data. Neurocomputing, 196, pp.113- 124.

Digital Library

[32]

Kato, M., Teshima, T. and Honda, J., 2019. Learning from positive and unlabeled data with a selection bias. Representations, pp.1--17.

[33]

Ke, T., Lv, H., Sun, M., et al. 2018. A biased least squares support vector machine based on Mahalanobis distance for PU learning. Physica A: Statistical Mechanics and its Applications, 509, pp.422--438.

[34]

Ke, T., Jing, L., Lv, H., et al. 2018. Global and local learning from positive and unlabeled examples. Artificial Intelligence, 48(8), pp.2373--2392.

Digital Library

[35]

Lan, W., Wang, J., Li, M., et al. 2016. Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, pp.50--57.

Digital Library

[36]

Denis, F., Laurent, A., Gilleron, R., et al, 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data, pp. 80--87.

[37]

Lee, W.S. and Liu, B., 2003. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the International Conference on Machine Learning, 3, pp.448--455.

[38]

Li, W., Guo, Q. and Elkan, C., 2010. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Transactions on Geoscience and Remote Sensing, 49(2), pp.717--725.

[39]

Li, X. and Liu, B., 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the International Joint Conference on Artificial Intelligence, 3, pp.587--592.

[40]

Li, X.L. and Liu, B., 2005. Learning from positive and unlabeled examples with different data distributions. In Proceedings of the European Conference on Machine Learning, pp. 218--229.

[41]

Li, X., Liu, B. and Ng, S.K., 2007. Learning to Identify Unexpected Instances in the Test Set. In Proceedings of the International Joint Conference on Artificial Intelligence, 7, pp.2802--2807.

[42]

Li, X.L., Yu, P.S., Liu, B. and Ng, S.K., 2009. Positive unlabeled learning for data stream classification. In Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 259--270.

[43]

Li, F., Zhang, Y., Purcell, A.W., Webb, G.I., Chou, K.C., Lithgow, T., Li, C. and Song, J., 2019. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics, 20(1), pp.1--17.

[44]

Liang, C., Zhang, Y., Shi, P. and Hu, Z., 2012. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Information Sciences, 213, pp.50- 67.

Digital Library

[45]

Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S., 2003. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pp. 179--186.

[46]

Mordelet, F. and Vert, J.P., 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, 37, pp.201--209.

Digital Library

[47]

Peng, T., Zuo, W. and He, F., 2008. SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowledge and Information Systems, 16(3), pp.281--301.

Digital Library

[48]

Qin, X., Zhang, Y., Li, C. and Li, X., 2013. Learning from data streams with only positive and unlabeled data. Journal of Intelligent Information Systems, 40(3), pp.405--430.

Digital Library

[49]

Xu, Z., Qi, Z. and Zhang, J., 2014. Learning with positive and unlabeled examples using biased twin support vector machine. Neural Computing and Applications, 25(6), pp.1303--1311.

Digital Library

[50]

Yang, P., Ormerod, J.T., Liu, W., Ma, C., Zomaya, A.Y. and Yang, J.Y., 2018. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), pp.1932--1943.

[51]

Yu, H., 2005. Single-class classification with mapping convergence. Machine Learning, 61(1), pp.49--69.

Digital Library

[52]

Zeng, X., Zhong, Y., Lin, W. and Zou, Q., 2020. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in Bioinformatics, 21(4), pp.1425--1436.

[53]

Zhang, Y., Ju, X. and Tian, Y., 2014. Nonparallel hyperplane support vector machine for pu learning. In Proceedings of the 10th International Conference on Natural Computation, pp. 703--708.

[54]

Zhang, D. and Lee, W.S., 2005. A simple probabilistic approach to learning from positive and unlabeled examples. In Proceedings of the 5th Annual UK Workshop on Computational Intelligence, pp. 83--87.

[55]

Zhang, B. and Zuo, W., 2009. Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples. Journal of Computers, 4(1), pp.94--101.

[56]

Zheng, Y., Peng, H., Zhang, X., Zhao, Z., Gao, X. and Li, J., 2019. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19), pp.1--12.

[57]

Zhou, J.T., Pan, S.J., Mao, Q. and Tsang, I.W., 2012. Multiview positive and unlabeled learning. In Proceedings of the Asian Conference on Machine Learning, pp.555--570.

[58]

Juba, B. and Le, H.S., 2019. Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1),pp. 4039--4048.

Digital Library

Cited By

Paz-Ruza JFreitas AAlonso-Betanzos AGuijarro-Berdiñas B(2024)Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genesComputers in Biology and Medicine10.1016/j.compbiomed.2024.108999180(108999)Online publication date: Sep-2024
https://doi.org/10.1016/j.compbiomed.2024.108999
Luo JPoursafaei FLiu X(2023)Towards Improved Illicit Node Detection with Positive-Unlabelled Learning2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC56567.2023.10174907(1-5)Online publication date: 1-May-2023
https://doi.org/10.1109/ICBC56567.2023.10174907
Saunders JFreitas A(2022)Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled LearningArtificial Evolution10.1007/978-3-031-42616-2_4(42-57)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-42616-2_4

Recommendations

Learning classifiers from only positive and unlabeled data
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case ...
Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled Learning
Artificial Evolution
Abstract
Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of a set of labelled positive instances and a set of unlabelled instances, where the latter can be either positive or ...
Classifier chains for positive unlabelled multi-label learning
Abstract
In traditional multi-label setting it is assumed that all relevant labels are assigned to the given instance. In positive unlabelled setting, only some of relevant labels are assigned. The appearance of a label means that the instance ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter

ACM SIGKDD Explorations Newsletter Volume 24, Issue 2

December 2022

130 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/3575637

Issue’s Table of Contents

Copyright © 2022 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Published in SIGKDD Volume 24, Issue 2

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
81
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)7

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Paz-Ruza JFreitas AAlonso-Betanzos AGuijarro-Berdiñas B(2024)Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genesComputers in Biology and Medicine10.1016/j.compbiomed.2024.108999180(108999)Online publication date: Sep-2024
https://doi.org/10.1016/j.compbiomed.2024.108999
Luo JPoursafaei FLiu X(2023)Towards Improved Illicit Node Detection with Positive-Unlabelled Learning2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC56567.2023.10174907(1-5)Online publication date: 1-May-2023
https://doi.org/10.1109/ICBC56567.2023.10174907
Saunders JFreitas A(2022)Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled LearningArtificial Evolution10.1007/978-3-031-42616-2_4(42-57)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-42616-2_4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents