Abstract
Bayesian network classifiers are, functionally, an interesting class of models, because they can be learnt out-of-core, i.e. without needing to hold the whole training data in main memory. The selective K-dependence Bayesian network classifier (SKDB) is state of the art in this class of models and has shown to rival random forest (RF) on problems with categorical data. In this paper, we introduce an ensembling technique for SKDB, called ensemble of SKDB (ESKDB). We show that ESKDB significantly outperforms RF on categorical and numerical data, as well as rivalling XGBoost. ESKDB combines three main components: (1) an effective strategy to vary the networks that is built by single classifiers (to make it an ensemble), (2) a stochastic discretization method which allows to both tackle numerical data as well as further increases the variance between different components of our ensemble and (3) a superior smoothing technique to ensure proper calibration of ESKDB’s probabilities. We conduct a large set of experiments with 72 datasets to study the properties of ESKDB (through a sensitivity analysis) and show its competitiveness with the state of the art.
Similar content being viewed by others
Notes
The more common representation \(\mathrm{Dir}(\alpha _1,\ldots , \alpha _C)\) is not used here.
References
Bostrom H (2007) Estimating class probabilities in random forests. In: Machine learning and applications, 2007. ICMLA 2007. 6th international conference on, IEEE, pp 211–216
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Buntine W (1991) Theory refinement of Bayesian networks. In: 7th conference on uncertainty in artificial intelligence, Anaheim, CA
Buntine W (1993) Learning classification trees. Artificial intelligence frontiers in statistics. Springer, Berlin, pp 182–201
Buntine W, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 881–890
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SigKDD international conference on knowledge discovery and data mining, ACM, pp 785–794
Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443):935–948
Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467
Dash D, Cooper GF (2004) Model averaging for prediction with discrete Bayesian networks. J Mach Learn Res 5:1177–1203
Du L (2011) Non-parametric Bayesian methods for structured topic models. Ph.D. thesis, Australian National University
Duan Z, Wang L (2017) \(K\)-dependence Bayesian classifier ensemble. Entropy 19(12):651
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1027
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, pp 23–37
Friedman J, Hastie T, Tibshirani R et al (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Hearst MA (1998) Support vector machines. IEEE Intell Syst 13(4):18–28
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors). Stat Sci 14(4):382–417
Koivisto M, Sood K (2004) Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5:549–573
Lewis DD (1998) Naive Bayes at forty: the independence assumption in information retrieval. Springer, Berlin, pp 4–15
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Madigan D, York J, Allard D (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Martínez AM, Webb GI, Chen S, Zaidi NA (2016) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(1):1515–1549
Petitjean F, Buntine W, Webb GI, Zaidi N (2018) Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes. Mach Learn 107(8):1303–1331
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215
Sahami M (1996) Learning limited dependence Bayesian classifiers. KDD 96:335–338
Shareghi E, Haffari G, Cohn T (2017) Compressed nonparametric language modelling. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 2701–2707
Teh YW, Jordan MI (2010) Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametr 1:158–207
Tian J, He R, Ram L (2010) Bayesian model averaging using the \(k\)-best Bayesian network structures. In: Proceedings of the 26th conference on uncertainty in artificial intelligence, AUAI Press, UAI’10, pp 589–597
Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. ICML Citeseer 1:609–616
Zhou ZH (2012) Ensemble methods: foundations and algorithms, 1st edn. CRC Press
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was partially supported by the China Scholarship Council under Awards 201506300081 and the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (Projects DP190100017 and DE170100037).
Rights and permissions
About this article
Cite this article
Zhang, H., Petitjean, F. & Buntine, W. Bayesian network classifiers using ensembles and smoothing. Knowl Inf Syst 62, 3457–3480 (2020). https://doi.org/10.1007/s10115-020-01458-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01458-z