Abstract
Double Bagging is a parallel ensemble method, where an additional classifier model is trained on the out-of-bag samples and then the posteriori class probabilities of this additional classifier are added with the inbag samples to train a decision tree classifier. The subsampled version of double bagging depend on two hyper parameters, subsample ratio (SSR) and an additional classifier. In this paper we have proposed an embedded cross-validation based selection technique to select one of these parameters automatically. This selection technique builds different ensemble classifier models with each of these parameter values (keeping another fixed) during the training phase of the ensemble method and finally select the one with the highest accuracy. We have used four additional classifier models, Radial Basis Support Vector Machine (RSVM), Linear Support Vector Machine (LSVM), Nearest Neighbor Classifier (5-NN and 10-NN) with five subsample ratios (SSR), 0.1, 0.2, 0.3, 0.4 and 0.5. We have reported the performance of the subsampled double bagging ensemble with these SSRs with each of these additional classifiers. In our experiments we have used UCI benchmark datasets. The results indicate that LSVM has superior performance as an additional classifiers in enhancing the predictive power of double bagging, where as with SSR 0.4 and 0.5 double bagging has better performance, than with other SSRs. We have also compared the performance of these resulting ensemble methods with Bagging, Adaboost, Double Bagging (original) and Rotation Forest. Experimental results show that the performance of the resulting subsampled double bagging ensemble is better than these ensemble methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996a)
Breiman, L.: Out-of-bag estimation. Statistics Department, University of Berkeley CA 94708, Technical Report (1996b)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the 21st Int’l Conf. on Machine Learning (2004)
Caruana, R., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: Proceedings of the Int’l Conf. on Data Mining, ICDM (2006)
Demšar, J.: Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Research 7, 1–30 (2006)
Dietterich, T.G.: Machine-learning research: Four current directions. AI Magazine 18(4), 97–136 (1997)
Freund, Y., Schapire, R.: Experiments with a New boosting algorithm. In: Machine Learning: Proceedings to the Thirteenth International Conference, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Freund, Y., Schapire, R.: A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55, 119–139 (1997)
Hastie, T., Tibshirani, R., Freidman, J.: The elements of statistical learning: data mining, inference and prediction. Springer, New York (2001)
Hothorn, T., Lausen, B.: Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognition 36(6), 1303–1309 (2003)
Hothorn, T., Lausen, B.: Bundling classifiers by bagging trees. Comput. Statist. Data Anal. 49, 1068–1078 (2005)
Kuncheva, L.I.: Combining Pattern Classifiers. Methods and Algorithms (2004)
Rodríguez, J., Kuncheva, L.: Rotation forest: A new classifier ensemble method. IEEE Trans. Patt. Analys. Mach. Intell. 28(10), 1619–1630 (2006)
Zaman, F., Hirose., H.: Double SVMbagging: A subsampling approach to SVM ensemble. To appear in Intelligent Automation and Computer Engineering. Springer, Heidelberg (2009)
Zaman, F., Hirose, H.: Effect of Subsampling Rate on Subbagging and Related Ensembles of Stable Classifiers. In: Chaudhury, S., et al. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 44–49. Springer, Heidelberg (2009)
Zaman, F., Hirose., H.: A Comparative Study on the Performance of Several Ensemble Methods with Low Subsampling Ratio. In: Accepted in 2nd Asian Conference on Intelligent Information and Database Systems, ACIIDS 2010 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faisal, Z., Uddin, M.M., Hirose, H. (2010). On Selecting Additional Predictive Models in Double Bagging Type Ensemble Method. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2010. ICCSA 2010. Lecture Notes in Computer Science, vol 6019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12189-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-12189-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12188-3
Online ISBN: 978-3-642-12189-0
eBook Packages: Computer ScienceComputer Science (R0)