Abstract
In the current era, data are growing with a faster rate in terms of exponential form where these data create a major challenge for suitable classification to classify the statistical data. The relevance of this topic is extraction of data, insights, mining of information from the dataset with an efficient and faster manner has attracted attention towards the best classification strategy. This paper presents a Ranger Random forest (RRF) algorithm for high-dimensional data classification. Random Forest (RF) has been treated as a most popular ensemble technique of classification due to its measure variable importance, out-of-bag error, proximities, etc. To make the classification constraint possible, in this paper, we use three different datasets in order to accommodate the runtime and memory utilization effectively with the same efficiency as given by the traditional random forest. We also depict the improvements of Random Forest in terms of computational time and memory without affecting the efficiency of the traditional Random Forest. Experimental results show that the proposed RRF outperforms with others in terms of memory utilization and computation time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wright, M.N., Ziegler, A.: Ranger: a fast implementation of random forests for high dimensional data in C++ and R. arXiv:1508.04409 (2015)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Kruppa, J., Liu, Y., Biau, G., Kohler, M., König, I.R., Malley, J.D., Ziegler, A.: Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory. Biom. J. 56(4), 534–563 (2014)
Nguyen, C., Wang, Y., Nguyen, H.N.: Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J. Biomed. Sci. Eng. 6(05), 551 (2013)
Azar, A.T., Elshazly, H.I., Hassanien, A.E., Elkorany, A.M.: A random forest classifier for lymph diseases. Comput. Methods Programs Biomed. 113(2), 465–473 (2014)
Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P.: An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote. Sens. 67, 93–104 (2012)
Ellis, K., Kerr, J., Godbole, S., Lanckriet, G., Wing, D., Marshall, S.: A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol. Meas. 35(11), 2191 (2014)
Feng, Q., Liu, J., Gong, J.: Urban flood mapping based on unmanned aerial vehicle remote sensing and random forest classifier—A case of Yuyao. China. Water 7(4), 1437–1455 (2015)
Xiong, J., Pan, J., Yang, J., Zhong, Z., Zou, R., Zhu, B.: An improved fast compressive tracking algorithm based on online random forest classifier. In: MATEC Web of Conferences, vol. 59. EDP Sciences (2016)
Wang, A.P., Wan, G.W., Cheng, Z.Q., Li, S.K.: Incremental learning extremely random forest classifier for online learning. RuanjianXuebao/J. Softw. 22(9), 2059–2074 (2011)
Mursalin, M., Zhang, Y., Chen, Y., Chawla, N.V.: Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier. Neurocomputing 241, 204–214 (2017)
Chaudhary, A., Kolhe, S., Kamal, R.: An improved random forest classifier for multi-class classification. Inf. Process. Agric. 3(4), 215–222 (2016)
Patti, C.R., Shahrbabaki, S.S., Dissanayaka, C. Cvetkovic, D.: Application of random forest classifier for automatic sleep spindle detection. In: Biomedical Circuits and Systems Conference (BioCAS), 2015, IEEE, pp. 1–4. IEEE (2015)
Sekhar, P., Mohanty, S.: Classification and assessment of power system static security using decision tree and random forest classifiers. Int. J. Numer. Model. Electron. Netw. Devices Fields 29(3), 465–474 (2016)
Genuer, R., Poggi, J.M., Tuleau-Malot, C., Villa-Vialaneix, N.: Random forests for big data. Big Data Res. 9, 28–46 (2017)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 841–860 (2008)
Aulchenko, Y.S., Ripke, S., Isaacs, A., van Duijn, C.M.: GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10), 1294–1296 (2007)
Harrell Jr., F.E., Califf, R.M., Pryor, D.B., Lee, K.L., Rosati, R.A.: Evaluating the yield of medical tests. JAMA 247(18), 2543–2546 (1982)
Epstein, J.M.: Agent-based computational models and generative social science. Complexity 4(5), 41–60 (1999)
Wickham, H.: Positioning. ggplot2, pp. 115–137. Springer, New York, NY (2009)
Acknowledgements
This research work is supported by Indian Institute of Technology (ISM), Government of India. The authors would like to express their gratitude and heartiest thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India for providing their research support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rao, G.M., Ramesh, D., Kumar, A. (2020). RRF-BD: Ranger Random Forest Algorithm for Big Data Classification. In: Behera, H., Nayak, J., Naik, B., Pelusi, D. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 990. Springer, Singapore. https://doi.org/10.1007/978-981-13-8676-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-13-8676-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8675-6
Online ISBN: 978-981-13-8676-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)