Abstract
How can we classify graph-structured data only with positive labels? Graph-based positive-unlabeled (PU) learning is to train a binary classifier given only the positive labels when the relationship between examples is given as a graph. The problem is of great importance for various tasks such as detecting malicious accounts in a social network, which are difficult to be modeled by supervised learning when the true negative labels are absent. Previous works for graph-based PU learning assume that the prior distribution of positive nodes is known in advance, which is not true in many real-world cases. In this work, we propose GRAB (Graph-based Risk minimization with iterAtive Belief propagation), a novel end-to-end approach for graph-based PU learning that requires no class prior. GRAB runs marginalization and update steps iteratively. The marginalization step models the given graph as a Markov network and estimates the marginals of latent variables. The update step trains the binary classifier by utilizing the computed marginals in the objective function. We then generalize GRAB to multi-positive unlabeled (MPU) learning, where multiple positive classes exist in a dataset. Extensive experiments on five real-world datasets show that GRAB achieves the state-of-the-art performance, even when the true prior is given only to the competitors.
Similar content being viewed by others
References
Yu H., Han J., Chang K.C. (2002) PEBL: positive example based learning for web page classification using SVM. In: KDD
Li M, Pan S, Zhang Y, Cai X (2016) Classifying networked text data with positive and unlabeled examples. Pattern Recognit, Lett
Kiryo R., Niu G., du Plessis M.C., Sugiyama M. (2017) Positive-unlabeled learning with non-negative risk estimator. In: NIPS
Nandanwar S., Murty M.N. (2016) Structural neighborhood based classification of nodes in a network. In: KDD
Ma Y., Wang S., Aggarwal C.C., Tang J. (2019) Graph convolutional networks with eigenpooling. In: KDD
Wang Y., Wang W., Liang Y., Cai Y., Liu J., Hooi B. (2020) Nodeaug: semi-supervised node classification with data augmentation. In: KDD
Kipf T.N., Welling M. (2017) Semi-supervised classification with graph convolutional networks. In: ICLR
Velickovic P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. (2018) Graph attention networks. In: ICLR
Velickovic P., Fedus W., Hamilton W.L., Liò P., Bengio Y., Hjelm R.D. Deep graph infomax. In: ICLR
Zhang C., Ren D., Liu T., Yang J., Gong C. (2019) Positive and unlabeled learning with label disambiguation. In: IJCAI
Yu S., Li C. (2007) PE-PUC: A graph based pu-learning approach for text classification. In: MLDM
Ma S., Zhang R. (2017) PU-LP: a novel approach for positive and unlabeled learning by label propagation. In: ICMEW
Wu M., Pan S., Du L., Tsang I.W., Zhu X., Du B. (2019) Long-short distance aggregation networks for positive unlabeled graph learning. In: CIKM
Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach, Learn
Jaskie K., Spanias A. (2019) Positive and unlabeled learning algorithms and applications: a survey. In: IISA
Li X., Liu B. (2005) Learning from positive and unlabeled examples with different data distributions. In: ECML
Belkin M., Niyogi P., Sindhwani V. (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res
du Plessis M.C., Niu G., Sugiyama M. (2014) Analysis of learning from positive and unlabeled data. In: NIPS
Du Plessis M., Niu G., Sugiyama M. (2015) Convex formulation for learning from positive and unlabeled data. In: ICML
Xu Y., Xu C., Xu C., Tao D. (2017) Multi-positive and unlabeled learning. In: Sierra, C. (ed.) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp 3182–3188
Shu S., Lin Z., Yan Y., Li L. (2020) Learning from multi-class positive and unlabeled data. In: Plant C., Wang H., Cuzzocrea A., Zaniolo C., Wu X (eds) 20th IEEE international conference on data mining., ICDM 2020, Sorrento, Italy, November 17–20, 2020, pp 1256–1261
Yoo J., Kim J., Yoon H., Kim G., Jang C., Kang U. (2021) Accurate graph-based PU learning without class prior. In: IEEE international conference on data mining, ICDM 2021, Auckland, New Zealand, Dec 7–10, 2021, pp 827–836
Yoo J., Jo S., Kang U. (2017) Supervised belief propagation: scalable supervised inference on attributed networks. In: ICDM
Yoo J., Kang U., Scanagatta M., Corani G., Zaffalon M. (2020) Sampling subgraphs with guaranteed treewidth for accurate and efficient graphical inference. In: WSDM
McGlohon M., Bay S., Anderle M.G., Steier D.M., Faloutsos C. (2009) SNARE: a link analytic system for graph labeling and risk detection. In: KDD
Yoo J., Jeon H., Kang U. (2019) Belief propagation network for hard inductive semi-supervised learning. In: IJCAI
Wu Z., Pan S., Chen F., Long G., Zhang C., Yu P.S. (2021) A comprehensive survey on graph neural networks. IEEE
Zhang Z., Cui P., Zhu W. (2018) Deep learning on graphs: a survey. CoRR
Koller D., Friedman N. Probabilistic graphical models—principles and techniques
Dempster A.P., Laird N.M., Rubin D.B. (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc Ser B (Methodol)
Murphy K.P. Machine learning—a probabilistic perspective. Adaptive computation and machine learning series
Chiang W., Liu X., Si S., Li Y., Bengio S., Hsieh C. (2019) Cluster-gcn: an efficient algorithm for training deep and large graph convolutional networks. In: KDD
Yang Z., Cohen W.W., Salakhutdinov R. (2016) Revisiting semi-supervised learning with graph embeddings. In: ICLR
Bojchevski A., Günnemann S. (2018) Deep gaussian embedding of graphs: unsupervised inductive learning via ranking. In: ICLR
Mernyei P., Cangea C. (2020) Wiki-cs: a wikipedia-based benchmark for graph neural networks. CoRR
Pennington J., Socher R., Manning C.D. (2014) Glove: global vectors for word representation. In: EMNLP
Grover A., Leskovec J. (2016) node2vec: scalable feature learning for networks. In: KDD
Pan S., Hu R., Long G., Jiang J., Yao L., Zhang C. (2018) Adversarially regularized graph autoencoder for graph embedding. In: IJCAI
Kingma D.P., Ba J. (2015) Adam: a method for stochastic optimization. In: ICLR
Acknowledgements
This research is supported from NCSOFT Co. U Kang is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yoo, J., Kim, J., Yoon, H. et al. Graph-based PU learning for binary and multiclass classification without class prior. Knowl Inf Syst 64, 2141–2169 (2022). https://doi.org/10.1007/s10115-022-01702-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01702-8