On Improving Random Forest for Hard-to-Classify Records

Md Nasim Adnan¹⁸ &
Md Zahidul Islam¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2480 Accesses

Abstract

Random Forest draws much interest from the research community because of its simplicity and excellent performance. The splitting attribute at each node of a decision tree for Random Forest is determined from a predefined number of randomly selected subset of attributes of the entire attribute set. The size of the subset is one of the most controversial points of Random Forest that encouraged many contributions. However, a little attention is given to improve Random Forest specifically for those records that are hard to classify. In this paper, we propose a novel technique of detecting hard-to-classify records and increase the weights of those records in a training data set. We then build Random Forest from the weighted training data set. The experimental results presented in this paper indicate that the ensemble accuracy of Random Forest can be improved when applied on weighted training data sets with more emphasis on hard-to-classify records.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

Improved Weighted Random Forest for Classification Problems

On Reducing the Bias of Random Forest

References

Adnan, M.N.: On dynamic selection of subspace for random forest. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 370–379. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14717-8_29
Google Scholar
Adnan, M.N., Islam, M.Z.: A comprehensive method for attribute space extension for random forest. In: Proceedings of 17th International Conference on Computer and Information Technology, December 2014
Google Scholar
Adnan, M.N., Islam, M.Z.: Complement random forest. In: Proceedings of the 13th Australasian Data Mining Conference (AusDM), pp. 89–97 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: Improving the random forest algorithm by randomly varying the size of the bootstrap samples for low dimensional data sets. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 391–396 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: One-vs-all binarization technique in the context of random forest. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 385–390 (2015)
Google Scholar
Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: In proceedings of the The 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)
Google Scholar
Ahmad, A., Brown, G.: Random projection random discretization ensembles - ensembles of linear multivariate decision trees. IEEE Trans. Knowl. Data Eng. 26(5), 1225–1239 (2014)
Article Google Scholar
Amasyali, M.F., Ersoy, O.K.: Classifier ensembles with the extended space forest. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2014)
Google Scholar
Bernard, S., Adam, S., Heutte, L.: Dynamic random forests. Pattern Recogn. Lett. 33, 1580–1586 (2012)
Article Google Scholar
Bernard, S., Heutte, L., Adam, S.: Forest-RK: a new random forest induction method. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 430–437. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85984-0_52
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)
MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, CA (1985)
MATH Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Article Google Scholar
Cutler, A., Zhao, G.: Pert: perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Furnkranz, J.: Round robin classification. J. Mach. Learn. Res. 2, 721–747 (2002)
MathSciNet MATH Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn. 44, 1761–1776 (2011)
Article Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article MATH Google Scholar
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)
MATH Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Article Google Scholar
Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor - a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australian Data Mining Conference (2011)
Google Scholar
Jain, A.K., Mao, J.: Artificial neural network: a tutorial. Computer 29(3), 31–44 (1996)
Article Google Scholar
Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2012)
Article Google Scholar
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar. 2016
Lobato, D.H., Munoz, G.M., Suarez, A.: How large should ensembles of classifiers be? Pattern Recogn. 46, 1323–1336 (2013)
Article MATH Google Scholar
Lorena, A.C., de Carvalho, A.C.P.L.F., Gama, J.M.P.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)
Article Google Scholar
Menze, B., Petrich, W., Hamprecht, F.: Multivariate feature selection and hierarchical classification for infrared spectroscopy: serum-based detection of bovine spongiform encephalopathy. Anal. Bioanal. Chem. 387, 1801–1807 (2007)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
MATH Google Scholar
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1619–1630 (2006)
Article Google Scholar
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Chapter Google Scholar
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013)
Chapter Google Scholar
Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30115-8_34
Chapter Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
Google Scholar
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
MathSciNet MATH Google Scholar
Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K., Li, X.: Stratified sampling of feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2014)
Article Google Scholar
Zhang, C., Masseglia, F., Zhang, X.: Discovering highly informative feature set over high dimensions. In: Proceedings of the IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1059–1064, November 2012
Google Scholar
Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 14, 35–62 (1998)
Article Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)
Article Google Scholar
Zhang, L., Suganthan, P.N.: Random forests with ensemble of feature spaces. Pattern Recogn. 47, 3429–3437 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Charles Sturt University, Bathurst, New South Wales, 2795, Australia
Md Nasim Adnan & Md Zahidul Islam

Authors

Md Nasim Adnan
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md Nasim Adnan .

Editor information

Editors and Affiliations

University of Technology , Sydney, New South Wales, Australia
Jinyan Li
University of Queensland , Brisbane, Australia
Xue Li
Beijing Institute of Technology , Beijing, China
Shuliang Wang
University of Western Australia , Crawley, West Australia, Australia
Jianxin Li
University of Adelaide , Adelaide, South Australia, Australia
Quan Z. Sheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adnan, M.N., Islam, M.Z. (2016). On Improving Random Forest for Hard-to-Classify Records. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-49586-6_39
Published: 13 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics