Abstract
Random Forest is one of the most popular decision forest building algorithms that uses decision trees as the base classifier. Decision trees for Random Forest are formed from the records of a training data set. This makes the decision trees almost equally biased towards the training data set. In reality, testing data set can be significantly different from the training data set. Thus, to reduce the bias of decision trees and hence of Random Forest, we introduce a random weight for each of the decision trees. We present experimental results on four widely used data sets from the UCI Machine Learning Repository. The experimental results indicate that the proposed technique can reduce the bias of Random Forest to become less sensitive to noisy data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Big Data Stats for the Big Future Ahead. https://hostingtribunal.com/blog/big-data-stats/
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37–53 (1996)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, vol. 12. Pearson Education (2011)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group (2017)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Abramson, N., Braverman, D., Sebestyen, G.: Pattern Recognition and Machine Learning, vol. 9. Springer, Heidelberg (1963)
Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3), 31–44 (1996)
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Disc. 2(4), 345–389 (1998)
Quinlan, J.R.: C4.5 - Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014141
Srivastava, A., Singh, V., Han, E.-H., Kumar, V.: An Efficient, Scalable, Parallel Classifier for Data Mining, pp. 544–555 (1996). http://www.Cs.Umn.Edu/~Kumar/Papers.Html
Adnan, Md.N., Islam, Md.Z.: ComboSplit: combining various splitting criteria for building a single decision tree. In: International Conference on Artificial Intelligence and Pattern Recognition, AIPR 2014, Held at the 3rd World Congress on Computing and Information Technology, WCIT, pp. 1–8 (2014)
Adnan, Md.N.: Decision tree and decision forest algorithms: on improving accuracy, efficiency and knowledge discovery. Ph.D. thesis, School of Computing and Mathematics, Charles Sturt University, Bathurst, Australia (2017)
Adnan, Md.N., Islam, Md.Z., Akbar, Md.M.: On improving the prediction accuracy of a decision tree using genetic algorithm. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 80–94. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_7
Adnan, Md.N., Islam, Md.Z., Kwan, P.W.H.: Extended space decision tree. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 219–230. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_23
Adnan, Md.N., Islam, Md.Z.: A comprehensive method for attribute space extension for Random Forest. In: 2014 17th International Conference on Computer and Information Technology, ICCIT 2014, pp. 25–29 (2003)
Adnan, Md.N., Islam, Md.Z.: Complement random forest. In: Conferences in Research and Practice in Information Technology Series, vol. 168, pp. 89–97 (2015)
Adnan, Md.N., Islam, Md.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings, pp. 391–396 (2015)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–44 (2006)
Adnan, Md.N., Islam, Md.Z.: Effects of dynamic subspacing in random forest. In: Cong, G., Peng, W.-C., Zhang, W.E., Li, C., Sun, A. (eds.) ADMA 2017. LNCS (LNAI), vol. 10604, pp. 303–312. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69179-4_21
Adnan, Md.N., Islam, Md.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Han, J., Kamber, M., Pei, J.: Concepts and Techniques: Data Mining. Morgan Kaufmann Publishers (2012)
Adnan, Md.N., Islam, Md.Z.: One-vs-all binarization technique in the context of random forest. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings, pp. 385–390 (2015)
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml. http://archive.ics.uci.edu/ml/datasets.html
Adnan, Md.N., Islam, Md.Z.: ForEx++: a new framework for knowledge discovery from decision forests. Australas. J. Inf. Syst. 21, 1–20 (2017)
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Adnan, Md.N., Islam, Md.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_25
Adnan, Md.N., Islam, Md.Z.: Forest PA: constructing a decision forest by penalizing attributes used in previous trees. Expert Syst. Appl. 89, 389–403 (2017)
Adnan, Md.N., Ip, R.H.L., Bewong, M., Islam, Md.Z.: BDF: a new decision forest algorithm. Inf. Sci. 569, 687–705 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Adnan, M.N. (2022). On Reducing the Bias of Random Forest. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13726. Springer, Cham. https://doi.org/10.1007/978-3-031-22137-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-22137-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22136-1
Online ISBN: 978-3-031-22137-8
eBook Packages: Computer ScienceComputer Science (R0)