Abstract
In the information retrieval framework, there are problems where the goal is to recover objects of a particular class from big sets of unlabelled objects. In some of these problems, only examples from the class we want to recover are available. For such problems, the machine learning community has developed algorithms that are able to learn binary classifiers in the absence of negative examples. Among them, we can find the positive Bayesian network classifiers, algorithms that induce Bayesian network classifiers from positive and unlabelled examples. The main drawback of these algorithms is that they require some previous knowledge about the a priori probability distribution of the class. In this paper, we propose a wrapper approach to tackle the learning when no such information is available, setting this probability at the optimal value in terms of the recovery of positive examples. The evaluation of classifiers in positive unlabelled learning problems is a non-trivial question. We have also worked on this problem, and we have proposed a new guiding metric to be used in the search for the optimal a priori probability of the positive class that we have called the pseudo F. We have empirically tested the proposed metric and the wrapper classifiers on both synthetic and real-life datasets. The results obtained in this empirical comparison show that the wrapper Bayesian network classifiers provide competitive results, particularly when the actual a priori probability of the positive class is high.
Similar content being viewed by others
Notes
Not, at least, directly from positive and unlabelled examples. In Elkan et al. [15], the authors present an interesting theoretical result that could be used to estimate this value under certain assumptions. However, the authors propose other ways to use this theoretical result to learn classifiers, as we will see in the experimental part of the paper.
Henceforth, we will refer to the actual a priori probability of the positive class \(P(C=1)\) as \(p\) and, in order to avoid mixing concepts, will refer to the parameter used in the training of PBCs as ‘training \(p\)’ or \(p_t\), to distinguish it from the actual a priori probability of the positive class.
We are assuming that \(X_i\), \(i=1,\ldots ,n\) are discrete variables that can take \(r_i\) values and that the class \(C\) is a binary variable that can take two values, 0 (or negative) and 1 (or positive).
Except when the previous optimum is in an extreme of the interval, in which case the optimum will also be in the corresponding extreme of the new interval, not in the middle.
Obviously, any information about the upper and/or lower bounds of this parameter could be introduced at this point, reducing the initial search space.
Details about the evaluation of the models will be provided in the next section.
In the most general probabilistic framework, an instance belongs to both the positive and negative class with some probability. In this work, we will focus on the particular case when the probability of belonging to one class is 1 and the probability of belonging to the other class is 0, that is, problems where the instances belong to one, and only one class.
The feasibility of this calculation will depend on the number of possible instances. When the exact value cannot be obtained, other approaches should be taken.
As the estimation of the recall involves only the positive examples, there is no need for dividing the unlabelled examples into folds.
Note that \(p\) refers to the actual a priori probability of the positive class that is a constant for a given problem domain.
In the original dataset, the attributes are continuous. As the classifiers used work with discrete attributes, the original features have been discretised into three intervals using an equal frequency strategy.
Note that, at most, we need 4,800 positive instances, and only 4,044 are available. To overcome this problem the positive examples in the set of unlabelled instances are resampled when needed (i.e. there are repeated positive examples in the set of unlabelled instances in some problems). This resampling process is used to complete only the unlabelled instances, never the set of known positive examples.
The complete set of results can be consulted at the supplementary data web page at http://www.sc.ehu.es/ccwbayes/members/borxa/wPBC/.
In a given comparison, we have two paired samples (the evaluation of two classifiers in fifty datasets). The test signed rank test is run on the differences between the performance of the two classifiers in each dataset, in order to exploit the fact that the samples are paired.
Both the tests and the correction were carried out using R. For the multiple testing correction, the package multtest was used.
The algorithm does not assume this, but it builds a classifier to differentiate labelled and unlabelled examples which, in practical terms, is the same as building a classifier to distinguish positive and negative examples assuming that all the unlabelled instances are negative.
References
Asuncion A, Newman DJ (2007) UCI machine learning repository
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat 29(4):1165–1188
Calvo B (2008) Positive unlabelled learning with applications in computational biology. Ph.D. thesis, Facultad de Informatica–UPV/EHU
Calvo B, Larrañaga P, Lozano JA (2007a) Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28:2375–2384
Calvo B, Larrañaga P, Lozano JA (2009) Feature subset selection from positive and unlabelled examples. Pattern Recognit Lett 30(11):1027–1036
Calvo B, López-Bigas N, Furney SJ, Larrañaga P, Lozano JA (2007b) A partially supervised classification approach to dominant and recessive human disease gene prediction. Comput Methods Program Biomed 85:229–237
Castelo R, Guigó R (2004) Splice site identification by idlbns. Bioinformatics 4(20):i69–i76
Cerulo L, Elkan C, Ceccarelli M (2010) Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinform 11:228
Chapelle O, Zien A, Schökopf B (2006) Semi-supervised learning. MIT Press, Cambridge
Chen Y, Li Z, Wang X, Feng J, Hu X (2010) Predicting gene function using few positive examples and unlabeled ones. BMC Genomics 11(Suppl 2):S11
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Denis F (1998) PAC learning from positive statistical queries. In: Proceedings of the 9th international conference on algorithmic learning theory. Springer, New York, pp 112–126
Denis F, Gilleron R, Tommasi M (2002) Text classification from positive and unlabeled examples. In: Proceedings of the 9th international conference on information processing and management of uncertainty in knowledge-based Systems. IPMU 2002, pp 1927–1934
Duan Q, Miao D, Jin K (2007) A rough set approach to classifying web page without negative examples. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, New York, pp 481–488
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, USA, pp 213–220
Friedman N, Geiger D, Goldszmit M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Furney SJ, Calvo B, Larrañaga P, Lozano JA, Lopez-Bigas N (2008) Prioritization of candidate cancer genes-an aid to oncogenomic studies. Nucleic Acids Res 36(18):e115
Ide JS, Cozman FG (2002) Random generation of Bayesian networks. In: Proceedings of the 16th Brazilian symposium on artificial intelligence: advances in artificial intelligence. Springer, New York, pp 366–375
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kononenko I (1991) Semi-naive Bayes classifiers. In: Proceedings of the 6th European working session on, learning, pp 206–219
Lee WS, Liu B (2003) Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the twentieth international conference on machine learning, pp 448–455
Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: Proceedings of the eighteenth international joint conference on artificial intelligence 2003, pp 587–594
Li X-L, Liu B, Ng S-K (2010) Negative training data can be harmful to text classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 218–228
Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning, pp 387–394
Minsky M (1961) Steps toward artificial intelligence. Proc Inst Radio Eng 49:8–30
Pan S, Zhang Y, Li X (2012) Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inform Syst 1–21. doi:10.1007/s10115-011-0469-2
Pazzani M (1997) Searching for dependencies in Bayesian classifiers. Springer, New York
Rodríguez JD, Pérez A, Lozano JA (2010) Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Sriphaew K, Takamura H, Okumura M (2009) Cool blog classification from positive and unlabeled examples. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 62–73
Tsai RT-H, Hung H-C, Dai H-J, Lin Y-W, Hsu W-L (2008) Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles. BMC Bioinform 9(Suppl 1):S3
Wang C, Ding C, Meraz RF, Holbrook SR (2006) PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22(21):2590–2596
Xiao Y, Segal MR (2008) Biological sequence classification utilizing positive and unlabeled data. Bioinformatics 24(9):1198–1205
Yousef M, Jung S, Showe LC, Showe MK (2008) Learning from positive examples when the negative class is undetermined-micro RNA gene identification. Algorithms Mol Biol 3:2
Yu H, Han J, Chang K (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81
Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using SVM. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 239–248
Zhang D, Lee WS (2005) A simple probabilistic approach to learning from positive and unlabeled examples. In: Proceedings of Fifth annual UK conference on computational intelligence (UKCI), pp 83–87
Zhao X-M, Wang Y, Chen L, Aihara K (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57
Acknowledgments
This work has been partially supported by the Saiotek and Research Groups 2007–2012 (IT-242-07) programs (Basque Government), TIN2010-14931, TIN2010-20900-C04-04 Consolider Ingenio 2010-CSD2007-00018 and Cajal Blue Brain Project (C080020-09) projects (Spanish Ministry of Science and Innovation), the COMBIOMED network in computational bio-medicine (Carlos III Health Institute) and the Diputación Foral de Gipuzkoa (2011-CIEN-000060-01).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Calvo, B., Inza, I., Larrañaga, P. et al. Wrapper positive Bayesian network classifiers. Knowl Inf Syst 33, 631–654 (2012). https://doi.org/10.1007/s10115-012-0553-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0553-2