On Reducing the Bias of Random Forest

Md. Nasim Adnan¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13726))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

999 Accesses

Abstract

Random Forest is one of the most popular decision forest building algorithms that uses decision trees as the base classifier. Decision trees for Random Forest are formed from the records of a training data set. This makes the decision trees almost equally biased towards the training data set. In reality, testing data set can be significantly different from the training data set. Thus, to reduce the bias of decision trees and hence of Random Forest, we introduce a random weight for each of the decision trees. We present experimental results on four widely used data sets from the UCI Machine Learning Repository. The experimental results indicate that the proposed technique can reduce the bias of Random Forest to become less sensitive to noisy data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Artificial Intelligence Random Forest Algorithm and the Application

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

Improved Weighted Random Forest for Classification Problems

References

Big Data Stats for the Big Future Ahead. https://hostingtribunal.com/blog/big-data-stats/
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37–53 (1996)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, vol. 12. Pearson Education (2011)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group (2017)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Article Google Scholar
Abramson, N., Braverman, D., Sebestyen, G.: Pattern Recognition and Machine Learning, vol. 9. Springer, Heidelberg (1963)
Google Scholar
Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3), 31–44 (1996)
Article Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000)
Article Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Article Google Scholar
Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Disc. 2(4), 345–389 (1998)
Article Google Scholar
Quinlan, J.R.: C4.5 - Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
Article MATH Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014141
Chapter Google Scholar
Srivastava, A., Singh, V., Han, E.-H., Kumar, V.: An Efficient, Scalable, Parallel Classifier for Data Mining, pp. 544–555 (1996). http://www.Cs.Umn.Edu/~Kumar/Papers.Html
Adnan, Md.N., Islam, Md.Z.: ComboSplit: combining various splitting criteria for building a single decision tree. In: International Conference on Artificial Intelligence and Pattern Recognition, AIPR 2014, Held at the 3rd World Congress on Computing and Information Technology, WCIT, pp. 1–8 (2014)
Google Scholar
Adnan, Md.N.: Decision tree and decision forest algorithms: on improving accuracy, efficiency and knowledge discovery. Ph.D. thesis, School of Computing and Mathematics, Charles Sturt University, Bathurst, Australia (2017)
Google Scholar
Adnan, Md.N., Islam, Md.Z., Akbar, Md.M.: On improving the prediction accuracy of a decision tree using genetic algorithm. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 80–94. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_7
Chapter Google Scholar
Adnan, Md.N., Islam, Md.Z., Kwan, P.W.H.: Extended space decision tree. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 219–230. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_23
Chapter Google Scholar
Adnan, Md.N., Islam, Md.Z.: A comprehensive method for attribute space extension for Random Forest. In: 2014 17th International Conference on Computer and Information Technology, ICCIT 2014, pp. 25–29 (2003)
Google Scholar
Adnan, Md.N., Islam, Md.Z.: Complement random forest. In: Conferences in Research and Practice in Information Technology Series, vol. 168, pp. 89–97 (2015)
Google Scholar
Adnan, Md.N., Islam, Md.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings, pp. 391–396 (2015)
Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–44 (2006)
Article Google Scholar
Adnan, Md.N., Islam, Md.Z.: Effects of dynamic subspacing in random forest. In: Cong, G., Peng, W.-C., Zhang, W.E., Li, C., Sun, A. (eds.) ADMA 2017. LNCS (LNAI), vol. 10604, pp. 303–312. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69179-4_21
Chapter Google Scholar
Adnan, Md.N., Islam, Md.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Article MATH Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Concepts and Techniques: Data Mining. Morgan Kaufmann Publishers (2012)
Google Scholar
Adnan, Md.N., Islam, Md.Z.: One-vs-all binarization technique in the context of random forest. In: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2015 - Proceedings, pp. 385–390 (2015)
Google Scholar
Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml. http://archive.ics.uci.edu/ml/datasets.html
Adnan, Md.N., Islam, Md.Z.: ForEx++: a new framework for knowledge discovery from decision forests. Australas. J. Inf. Syst. 21, 1–20 (2017)
Google Scholar
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)
Article MathSciNet MATH Google Scholar
Adnan, Md.N., Islam, Md.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_25
Chapter Google Scholar
Adnan, Md.N., Islam, Md.Z.: Forest PA: constructing a decision forest by penalizing attributes used in previous trees. Expert Syst. Appl. 89, 389–403 (2017)
Article Google Scholar
Adnan, Md.N., Ip, R.H.L., Bewong, M., Islam, Md.Z.: BDF: a new decision forest algorithm. Inf. Sci. 569, 687–705 (2021)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
Md. Nasim Adnan

Authors

Md. Nasim Adnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Nasim Adnan .

Editor information

Editors and Affiliations

The University of Adelaide, Adelaide, SA, Australia
Weitong Chen
The University of New South Wales, Sydney, NSW, Australia
Lina Yao
Macquarie University, Sydney, NSW, Australia
Taotao Cai
Griffith University, Brisbane, QLD, Australia
Shirui Pan
Microsoft, Beijing, China
Tao Shen
The University of Queensland, Brisbane, QLD, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adnan, M.N. (2022). On Reducing the Bias of Random Forest. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13726. Springer, Cham. https://doi.org/10.1007/978-3-031-22137-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-22137-8_14
Published: 24 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22136-1
Online ISBN: 978-3-031-22137-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics