Wrapper positive Bayesian network classifiers

Borja Calvo¹,
Iñaki Inza¹,
Pedro Larrañaga² &
…
Jose A. Lozano¹

390 Accesses
5 Citations
Explore all metrics

Abstract

In the information retrieval framework, there are problems where the goal is to recover objects of a particular class from big sets of unlabelled objects. In some of these problems, only examples from the class we want to recover are available. For such problems, the machine learning community has developed algorithms that are able to learn binary classifiers in the absence of negative examples. Among them, we can find the positive Bayesian network classifiers, algorithms that induce Bayesian network classifiers from positive and unlabelled examples. The main drawback of these algorithms is that they require some previous knowledge about the a priori probability distribution of the class. In this paper, we propose a wrapper approach to tackle the learning when no such information is available, setting this probability at the optimal value in terms of the recovery of positive examples. The evaluation of classifiers in positive unlabelled learning problems is a non-trivial question. We have also worked on this problem, and we have proposed a new guiding metric to be used in the search for the optimal a priori probability of the positive class that we have called the pseudo F. We have empirically tested the proposed metric and the wrapper classifiers on both synthetic and real-life datasets. The results obtained in this empirical comparison show that the wrapper Bayesian network classifiers provide competitive results, particularly when the actual a priori probability of the positive class is high.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating the Use of a Distance-Weighted Criterion in Wrapper-Based Semi-supervised Methods

Wrapper feature selection with partially labeled data

Article 03 February 2022

Shapley-Value Data Valuation for Semi-supervised Learning

Notes

Not, at least, directly from positive and unlabelled examples. In Elkan et al. [15], the authors present an interesting theoretical result that could be used to estimate this value under certain assumptions. However, the authors propose other ways to use this theoretical result to learn classifiers, as we will see in the experimental part of the paper.
Henceforth, we will refer to the actual a priori probability of the positive class $P(C=1)$ as $p$ and, in order to avoid mixing concepts, will refer to the parameter used in the training of PBCs as ‘training $p$’ or $p_t$, to distinguish it from the actual a priori probability of the positive class.
We are assuming that $X_i$, $i=1,\ldots ,n$ are discrete variables that can take $r_i$ values and that the class $C$ is a binary variable that can take two values, 0 (or negative) and 1 (or positive).
More details on the estimation of the naive Bayes model parameters in the absence of negative examples can be consulted at Calvo [3] and Denis et al. [13].
Except when the previous optimum is in an extreme of the interval, in which case the optimum will also be in the corresponding extreme of the new interval, not in the middle.
Obviously, any information about the upper and/or lower bounds of this parameter could be introduced at this point, reducing the initial search space.
Details about the evaluation of the models will be provided in the next section.
In the most general probabilistic framework, an instance belongs to both the positive and negative class with some probability. In this work, we will focus on the particular case when the probability of belonging to one class is 1 and the probability of belonging to the other class is 0, that is, problems where the instances belong to one, and only one class.
The feasibility of this calculation will depend on the number of possible instances. When the exact value cannot be obtained, other approaches should be taken.
As the estimation of the recall involves only the positive examples, there is no need for dividing the unlabelled examples into folds.
Note that $p$ refers to the actual a priori probability of the positive class that is a constant for a given problem domain.
In the original dataset, the attributes are continuous. As the classifiers used work with discrete attributes, the original features have been discretised into three intervals using an equal frequency strategy.
Note that, at most, we need 4,800 positive instances, and only 4,044 are available. To overcome this problem the positive examples in the set of unlabelled instances are resampled when needed (i.e. there are repeated positive examples in the set of unlabelled instances in some problems). This resampling process is used to complete only the unlabelled instances, never the set of known positive examples.
The complete set of results can be consulted at the supplementary data web page at http://www.sc.ehu.es/ccwbayes/members/borxa/wPBC/.
In a given comparison, we have two paired samples (the evaluation of two classifiers in fifty datasets). The test signed rank test is run on the differences between the performance of the two classifiers in each dataset, in order to exploit the fact that the samples are paired.
Both the tests and the correction were carried out using R. For the multiple testing correction, the package multtest was used.
The algorithm does not assume this, but it builds a classifier to differentiate labelled and unlabelled examples which, in practical terms, is the same as building a classifier to distinguish positive and negative examples assuming that all the unlabelled instances are negative.

References

Asuncion A, Newman DJ (2007) UCI machine learning repository
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat 29(4):1165–1188
Article MathSciNet MATH Google Scholar
Calvo B (2008) Positive unlabelled learning with applications in computational biology. Ph.D. thesis, Facultad de Informatica–UPV/EHU
Calvo B, Larrañaga P, Lozano JA (2007a) Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28:2375–2384
Article Google Scholar
Calvo B, Larrañaga P, Lozano JA (2009) Feature subset selection from positive and unlabelled examples. Pattern Recognit Lett 30(11):1027–1036
Article Google Scholar
Calvo B, López-Bigas N, Furney SJ, Larrañaga P, Lozano JA (2007b) A partially supervised classification approach to dominant and recessive human disease gene prediction. Comput Methods Program Biomed 85:229–237
Article Google Scholar
Castelo R, Guigó R (2004) Splice site identification by idlbns. Bioinformatics 4(20):i69–i76
Article Google Scholar
Cerulo L, Elkan C, Ceccarelli M (2010) Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinform 11:228
Article Google Scholar
Chapelle O, Zien A, Schökopf B (2006) Semi-supervised learning. MIT Press, Cambridge
Google Scholar
Chen Y, Li Z, Wang X, Feng J, Hu X (2010) Predicting gene function using few positive examples and unlabeled ones. BMC Genomics 11(Suppl 2):S11
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Denis F (1998) PAC learning from positive statistical queries. In: Proceedings of the 9th international conference on algorithmic learning theory. Springer, New York, pp 112–126
Denis F, Gilleron R, Tommasi M (2002) Text classification from positive and unlabeled examples. In: Proceedings of the 9th international conference on information processing and management of uncertainty in knowledge-based Systems. IPMU 2002, pp 1927–1934
Duan Q, Miao D, Jin K (2007) A rough set approach to classifying web page without negative examples. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, New York, pp 481–488
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, USA, pp 213–220
Friedman N, Geiger D, Goldszmit M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Article MATH Google Scholar
Furney SJ, Calvo B, Larrañaga P, Lozano JA, Lopez-Bigas N (2008) Prioritization of candidate cancer genes-an aid to oncogenomic studies. Nucleic Acids Res 36(18):e115
Article Google Scholar
Ide JS, Cozman FG (2002) Random generation of Bayesian networks. In: Proceedings of the 16th Brazilian symposium on artificial intelligence: advances in artificial intelligence. Springer, New York, pp 366–375
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Kononenko I (1991) Semi-naive Bayes classifiers. In: Proceedings of the 6th European working session on, learning, pp 206–219
Lee WS, Liu B (2003) Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the twentieth international conference on machine learning, pp 448–455
Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: Proceedings of the eighteenth international joint conference on artificial intelligence 2003, pp 587–594
Li X-L, Liu B, Ng S-K (2010) Negative training data can be harmful to text classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 218–228
Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning, pp 387–394
Minsky M (1961) Steps toward artificial intelligence. Proc Inst Radio Eng 49:8–30
MathSciNet Google Scholar
Pan S, Zhang Y, Li X (2012) Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inform Syst 1–21. doi:10.1007/s10115-011-0469-2
Pazzani M (1997) Searching for dependencies in Bayesian classifiers. Springer, New York
Google Scholar
Rodríguez JD, Pérez A, Lozano JA (2010) Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Article Google Scholar
Sriphaew K, Takamura H, Okumura M (2009) Cool blog classification from positive and unlabeled examples. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 62–73
Tsai RT-H, Hung H-C, Dai H-J, Lin Y-W, Hsu W-L (2008) Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles. BMC Bioinform 9(Suppl 1):S3
Article Google Scholar
Wang C, Ding C, Meraz RF, Holbrook SR (2006) PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22(21):2590–2596
Article Google Scholar
Xiao Y, Segal MR (2008) Biological sequence classification utilizing positive and unlabeled data. Bioinformatics 24(9):1198–1205
Article Google Scholar
Yousef M, Jung S, Showe LC, Showe MK (2008) Learning from positive examples when the negative class is undetermined-micro RNA gene identification. Algorithms Mol Biol 3:2
Article Google Scholar
Yu H, Han J, Chang K (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81
Article Google Scholar
Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using SVM. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 239–248
Zhang D, Lee WS (2005) A simple probabilistic approach to learning from positive and unlabeled examples. In: Proceedings of Fifth annual UK conference on computational intelligence (UKCI), pp 83–87
Zhao X-M, Wang Y, Chen L, Aihara K (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57
Article Google Scholar

Download references

Acknowledgments

This work has been partially supported by the Saiotek and Research Groups 2007–2012 (IT-242-07) programs (Basque Government), TIN2010-14931, TIN2010-20900-C04-04 Consolider Ingenio 2010-CSD2007-00018 and Cajal Blue Brain Project (C080020-09) projects (Spanish Ministry of Science and Innovation), the COMBIOMED network in computational bio-medicine (Carlos III Health Institute) and the Diputación Foral de Gipuzkoa (2011-CIEN-000060-01).

Author information

Authors and Affiliations

Intelligent Systems Group, Department of Computer Science and Artificial Intelligence, UPV/EHU, Manuel de Lardizabal, 1, 20018 , Donostia, Spain
Borja Calvo, Iñaki Inza & Jose A. Lozano
Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, 28660 , Boadilla del Monte, Spain
Pedro Larrañaga

Authors

Borja Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Iñaki Inza
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Larrañaga
View author publications
You can also search for this author in PubMed Google Scholar
Jose A. Lozano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Borja Calvo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calvo, B., Inza, I., Larrañaga, P. et al. Wrapper positive Bayesian network classifiers. Knowl Inf Syst 33, 631–654 (2012). https://doi.org/10.1007/s10115-012-0553-2

Download citation

Received: 06 December 2011
Revised: 10 May 2012
Accepted: 22 August 2012
Published: 09 September 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10115-012-0553-2

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating the Use of a Distance-Weighted Criterion in Wrapper-Based Semi-supervised Methods

Wrapper feature selection with partially labeled data

Shapley-Value Data Valuation for Semi-supervised Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Wrapper positive Bayesian network classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Investigating the Use of a Distance-Weighted Criterion in Wrapper-Based Semi-supervised Methods

Wrapper feature selection with partially labeled data

Shapley-Value Data Valuation for Semi-supervised Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation