Abstract
In recent years, learning from imbalanced data sets has become a challenging issue in machine learning and data mining communities. This problem occurs when some classes of data have smaller number of instances than other classes. Multi-class imbalanced data sets have been pervasively observed in many real world applications. Many typical machine learning algorithms pose many difficulties dealing with these kinds of data sets. In this paper, we proposed an ensemble pruning approach which is based on Reinforcement Learning framework. In effect, we were inspired by Markov Decision Process and considered the ensemble pruning problem as a one player game, and select the best classifiers among our initial state space. These selected classifiers which can produce a good ensemble model, are employed to learn from multi-class imbalanced data sets. Our experimental results on some UCI and KEEL benchmark data sets show promising improvements in terms of minority class recall, G-mean, and MAUC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)
Alibeigi, M., Hashemi, S., Hamzeh, A.: DBFS: An effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl. Eng. 8182, 67–103 (2012)
Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: improving prediction of the minority class in boosting. Knowl. Disc Databases 2003, 107–119 (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 341–378 (2002)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of the 6th International Conference on Data Mining (ICDM 06), pp. 592–602 (2006)
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012)
Liao, T.W.: Classification of weld flaws with imbalanced class data. Expert Syst. Appl. 35(3), 1041–1052 (2008)
Fernandez, A., del Jesus, M.J., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. Comput. Intell. Knowl. Based Syst. Des. 6178, 8998 (2010)
Hazrati, S. M., Hamzeh, A., Hashemi, S.: A game theoretic framework for feature selection. In: 9th International Conference on. IEEE Fuzzy Systems and Knowledge Discovery (FSKD) (2012)
Gaudel, R., Sebag, M.: Feature selection as a one-player game. In: International Conference on Machine Learning, pp. 359–366 (2010)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1, MIT press, Cambridge (1998)
Kocsis, L., Szepesvri, C.: Bandit based monte-carlo planning, Mach. Learn.: ECML 282–293 (2006)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Kuncheva, L.I., Whitaker, C.J.: Ten measures of diversity in classifier ensembles: limits for two classifiers, In: Intelligent Sensor Processing, A DERA/IEE Workshop, pp. 10–1, IET (2001)
Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2010)
Alcala-Fdez, J., Fernandez, A., Luego, J., Derrac, J., Garcia, S., Sanchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Witten, I.H., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Hand, D.J., Till, R.J.: A simple generalization of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)
Wang, S., Yao, X.: Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans. Knowl. Data Eng. 25(1), 206–219 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Abdi, L., Hashemi, S. (2014). An Ensemble Pruning Approach Based on Reinforcement Learning in Presence of Multi-class Imbalanced Data. In: Pant, M., Deep, K., Nagar, A., Bansal, J. (eds) Proceedings of the Third International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 258. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1771-8_52
Download citation
DOI: https://doi.org/10.1007/978-81-322-1771-8_52
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1770-1
Online ISBN: 978-81-322-1771-8
eBook Packages: EngineeringEngineering (R0)