Abstract
Random Forest draws much interest from the research community because of its simplicity and excellent performance. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected subset of attributes of the entire attribute set. The size of the subset is one of the most controversial points of Random Forest that encouraged many contributions. However, a little attention is given to improve Random Forest specifically for those records that are hard to classify. In this paper, we propose a novel technique of detecting hard-to-classify records and increase the weights of those records in a training data set. We then build Random Forest from the weighted training data set. The experimental results presented in this paper indicate that the ensemble accuracy of Random Forest can be improved when applied on weighted training data sets with more emphasis on hard-to-classify records.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adnan, M.N.: On dynamic selection of subspace for random forest. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 370–379. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14717-8_29
Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, December 2014
Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)
Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)
Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)
Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: In proceedings of the The 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)
Ahmad, A., Brown, G.: Random projection random discretization ensembles - ensembles of linear multivariate decision trees. IEEE Trans. Knowl. Data Eng. 26(5), 1225–1239 (2014)
Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)
Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recogn. Lett. 33, 1580–1586 (2012)
Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, CA (1985)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)
Furnkranz, J.: Round robin classification. J. Mach. Learn. Res. 2, 721–747 (2002)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44, 1761–1776 (2011)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australian Data Mining Conference (2011)
Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)
Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2012)
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar. 2016
Lobato, D.H., Munoz, G.M., Suarez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46, 1323–1336 (2013)
Lorena, A.C., de Carvalho, A.C.P.L.F., Gama, J.M.P.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)
Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013)
Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2014)
Zhang, C., Masseglia, F., Zhang, X.: Discovering highly informative feature set over high dimensions. In: Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1059–1064, November 2012
Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 14, 35–62 (1998)
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)
Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recogn. 47, 3429–3437 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Adnan, M.N., Islam, M.Z. (2016). On Improving Random Forest for Hard-to-Classify Records. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-49586-6_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)