Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Evaluating the Predictive Performance of Positive- Unlabelled Classifiers: a brief critical review and practical recommendations for improvement

Published: 08 December 2022 Publication History

Abstract

Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances. Whilst much work has been done proposing methods for PU learning, little has been written on the subject of evaluating these methods. Many popular standard classification metrics cannot be precisely calculated due to the absence of fully labelled data, so alternative approaches must be taken. This short commentary paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers and provides practical recommendations for improvements in this area.

References

[1]
Bekker, J. and Davis, J., 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), pp.719--760.
[2]
Elkan, C. and Noto, K., 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213--220.
[3]
Nikdelfaz, O. and Jalili, S., 2018. Disease genes prediction by HMM based PU-learning using gene expression profiles. Journal of Biomedical Informatics, 81, pp.102--111.
[4]
Vasighizaker, A. and Jalili, S. 2018. C-PUGP: A cluster-based positive unlabelled learning method for disease gene prediction and prioritisation. Computational Biology and Chemistry, 76, pp. 23--31.
[5]
Yang, P., Li, X., Mei, K., et al. 2012. Positive-unlabelled learning for disease gene identification. Bioinformatics, 28(20), pp. 2640--2647.
[6]
Liu, L. and Peng, T., 2014. Clustering-based Method for Positive and Unlabelled Text Categorization Enhanced by Improved TFIDF. Journal of Information Science and Engineering, 30, pp. 1463--1481.
[7]
Ke, T., Yang, B., Zhen, L., et al. 2012. Building highperformance classifiers using positive and unlabelled examples for text. International Symposium on Neural Networks, pp. 187--195.
[8]
Liu, B., Yu, P., and Li, X. 2002. Partially supervised classification of text documents. International Conference on Machine Learning, 2(485), pp. 387--394.
[9]
Zhang, Y., Li, L., Zhou, J., et al. 2017. Poster: A PU learning based system for potential malicious URL detection. Proceedings of the ACM Conference on Computer and Communications Security, pp. 2599--2601.
[10]
Luo, Y., Cheng, S., Liu, C., et al. 2018. PU learning in payload-based web anomaly detection. Proceedings of the Third International Conference on Security of Smart Cities, Industrial Control System and Communications, pp. 1--5.
[11]
Van Engelen, J.E. and Hoos, H.H., 2020. A survey on semisupervised learning. Machine Learning, 109(2), pp.373- 440.
[12]
Jaskie, K. and Spanias, A., 2019. Positive and unlabeled learning algorithms and applications: A survey. In Proceedings of the 10th International Conference on Information, Intelligence, Systems and Applications (pp. 1- 8).
[13]
Li, G., 2013. A survey on positive and unlabelled learning. Computer & Information Sciences.
[14]
Japkowicz, N. and Shah, M., 2011. Evaluating Learning Algorithms: a classification perspective. Cambridge University Press, 2011.
[15]
Bekker, J. and Davis, J., 2018. Estimating the class prior in positive and unlabeled data through decision tree induction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 32(1), pp.2712--2719.
[16]
Du Plessis, M.C. and Sugiyama, M., 2014. Class prior estimation from positive and unlabeled data. IEICE TRANSACTIONS on Information and Systems, 97(5), pp.1358--1362.
[17]
Nguyen, M.N., Li, X.L. and Ng, S.K., 2011. Positive unlabeled learning for time series classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2, pp.1421--1426.
[18]
Zhou, K., Gui-Rong, X., Yang, Q., et al. 2010. Learning with positive and unlabelled examples using topic-sensitive PLSA. IEEE Transactions on Knowledge and Data Engineering, 22(1), pp. 46--58.
[19]
Basile, T., Di Mauro, N., Esposito, F., et al. 2018. Density estimators for positive-unlabelled learning. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns, pp.49--64.
[20]
Bekker, J., and Davis, J., 2017. Positive and unlabelled relational classification through label frequency estimation. In Proceedings of the International Conference on Inductive Logic Programming, pp.16--30.
[21]
Calvo, B., Larrañaga, P., and Lozano, J., 2007. Learning Bayesian classifiers from positive and unlabelled examples. Pattern Recognition Letters, 28(16), pp.2375--2384.
[22]
Chaudhari, S., and Shevade, S., 2012. Learning from positive and unlabelled examples using maximum margin clustering. In Proceedings of the International Conference on Neural Information Processing, pp.465--473.
[23]
Chiaroni, F., Rahal, M., Hueber, N., et al. 2018. Learning with a generative adversarial network from a positive unlabeled dataset for image classification. In Proceedings of the 25th IEEE International Conference on Image Processing, pp.1368--1372.
[24]
Claesen, M., De Smet, F., Suykens, J.A. and De Moor, B., 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing, 160, pp.73--84.
[25]
Denis, F., Gilleron, R., and Letouzey, F., 2005. Learning from positive and unlabeled examples. Theoretical Computer Science, pp.70--83.
[26]
Fung, C., Yu, J., Lu, H., et al. 2006. Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1), pp.6--20.
[27]
Gan, H., Zhang, Y., and Song, Q., 2017. Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recognition Letters, 90, pp.28--35.
[28]
He, F., Liu, T., Webb, G.I. and Tao, D., 2018. Instancedependent PU learning by Bayesian optimal relabeling. arXiv preprint arXiv:1808.02180.
[29]
He. J., Zhang, Y., Li, X., et al. 2010. Naïve Bayes classifier for positive unlabeled learning with uncertainty. In Proceedings of the 2010 SIAM International Conference on Data Mining, pp.361--372.
[30]
Hou, M., Chaib-draa, B., Li, C., et al. 2018. Generative adversarial positive-unlabeled learning. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp.2255--2261.
[31]
Ienco, D., and Pensa, R., 2016. Positive and unlabeled learning in categorical data. Neurocomputing, 196, pp.113- 124.
[32]
Kato, M., Teshima, T. and Honda, J., 2019. Learning from positive and unlabeled data with a selection bias. Representations, pp.1--17.
[33]
Ke, T., Lv, H., Sun, M., et al. 2018. A biased least squares support vector machine based on Mahalanobis distance for PU learning. Physica A: Statistical Mechanics and its Applications, 509, pp.422--438.
[34]
Ke, T., Jing, L., Lv, H., et al. 2018. Global and local learning from positive and unlabeled examples. Artificial Intelligence, 48(8), pp.2373--2392.
[35]
Lan, W., Wang, J., Li, M., et al. 2016. Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, pp.50--57.
[36]
Denis, F., Laurent, A., Gilleron, R., et al, 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data, pp. 80--87.
[37]
Lee, W.S. and Liu, B., 2003. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the International Conference on Machine Learning, 3, pp.448--455.
[38]
Li, W., Guo, Q. and Elkan, C., 2010. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Transactions on Geoscience and Remote Sensing, 49(2), pp.717--725.
[39]
Li, X. and Liu, B., 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the International Joint Conference on Artificial Intelligence, 3, pp.587--592.
[40]
Li, X.L. and Liu, B., 2005. Learning from positive and unlabeled examples with different data distributions. In Proceedings of the European Conference on Machine Learning, pp. 218--229.
[41]
Li, X., Liu, B. and Ng, S.K., 2007. Learning to Identify Unexpected Instances in the Test Set. In Proceedings of the International Joint Conference on Artificial Intelligence, 7, pp.2802--2807.
[42]
Li, X.L., Yu, P.S., Liu, B. and Ng, S.K., 2009. Positive unlabeled learning for data stream classification. In Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 259--270.
[43]
Li, F., Zhang, Y., Purcell, A.W., Webb, G.I., Chou, K.C., Lithgow, T., Li, C. and Song, J., 2019. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics, 20(1), pp.1--17.
[44]
Liang, C., Zhang, Y., Shi, P. and Hu, Z., 2012. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Information Sciences, 213, pp.50- 67.
[45]
Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S., 2003. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pp. 179--186.
[46]
Mordelet, F. and Vert, J.P., 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, 37, pp.201--209.
[47]
Peng, T., Zuo, W. and He, F., 2008. SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowledge and Information Systems, 16(3), pp.281--301.
[48]
Qin, X., Zhang, Y., Li, C. and Li, X., 2013. Learning from data streams with only positive and unlabeled data. Journal of Intelligent Information Systems, 40(3), pp.405--430.
[49]
Xu, Z., Qi, Z. and Zhang, J., 2014. Learning with positive and unlabeled examples using biased twin support vector machine. Neural Computing and Applications, 25(6), pp.1303--1311.
[50]
Yang, P., Ormerod, J.T., Liu, W., Ma, C., Zomaya, A.Y. and Yang, J.Y., 2018. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), pp.1932--1943.
[51]
Yu, H., 2005. Single-class classification with mapping convergence. Machine Learning, 61(1), pp.49--69.
[52]
Zeng, X., Zhong, Y., Lin, W. and Zou, Q., 2020. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in Bioinformatics, 21(4), pp.1425--1436.
[53]
Zhang, Y., Ju, X. and Tian, Y., 2014. Nonparallel hyperplane support vector machine for pu learning. In Proceedings of the 10th International Conference on Natural Computation, pp. 703--708.
[54]
Zhang, D. and Lee, W.S., 2005. A simple probabilistic approach to learning from positive and unlabeled examples. In Proceedings of the 5th Annual UK Workshop on Computational Intelligence, pp. 83--87.
[55]
Zhang, B. and Zuo, W., 2009. Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples. Journal of Computers, 4(1), pp.94--101.
[56]
Zheng, Y., Peng, H., Zhang, X., Zhao, Z., Gao, X. and Li, J., 2019. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19), pp.1--12.
[57]
Zhou, J.T., Pan, S.J., Mao, Q. and Tsang, I.W., 2012. Multiview positive and unlabeled learning. In Proceedings of the Asian Conference on Machine Learning, pp.555--570.
[58]
Juba, B. and Le, H.S., 2019. Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1),pp. 4039--4048.

Cited By

View all
  • (2024)Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genesComputers in Biology and Medicine10.1016/j.compbiomed.2024.108999180(108999)Online publication date: Sep-2024
  • (2023)Towards Improved Illicit Node Detection with Positive-Unlabelled Learning2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC56567.2023.10174907(1-5)Online publication date: 1-May-2023
  • (2022)Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled LearningArtificial Evolution10.1007/978-3-031-42616-2_4(42-57)Online publication date: 31-Oct-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 24, Issue 2
December 2022
130 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/3575637
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022
Published in SIGKDD Volume 24, Issue 2

Check for updates

Author Tags

  1. classification
  2. machine learning
  3. positive-unlabelled learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genesComputers in Biology and Medicine10.1016/j.compbiomed.2024.108999180(108999)Online publication date: Sep-2024
  • (2023)Towards Improved Illicit Node Detection with Positive-Unlabelled Learning2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC56567.2023.10174907(1-5)Online publication date: 1-May-2023
  • (2022)Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled LearningArtificial Evolution10.1007/978-3-031-42616-2_4(42-57)Online publication date: 31-Oct-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media