Authors:
Amaal R. Al Shorman
1
;
Hossam Faris
1
;
Pedro A. Castillo
2
;
J. J. Merelo
2
and
Nailah Al-Madi
3
Affiliations:
1
Business Information Technology Department, King Abdullah II School for Information Technology, The University of Jordan, Amman and Jordan
;
2
Department of Computer Architecture and Computer Technology, ETSIIT and CITIC, University of Granada, Granada and Spain
;
3
Computer Science Department, Princess Sumaya University for Technology, Amman and Jordan
Keyword(s):
Classification, Genetic Programming, Preprocessing, Standardization Methods.
Abstract:
Genetic programming (GP) is a powerful classification technique. It is interpretable and it can dynamically build very complex expressions that maximize or minimize some fitness functions. It has a capacity to model very complex problems in the area of Machine Learning, Data Mining and Pattern Recognition. Nevertheless, GP has a high computational complexity time. On the other side, data standardization is one of the most important pre-processing steps in machine learning. The purpose of this step is to unify the scale of all input features to have equal contribution to the model. The objective of this paper is to investigate the influence of input data standardization methods on GP, and how it affects its prediction accuracy. Six different methods of input data standardization were checked in order to determine which one allows to achieve the most accurate result with lowest computational cost. The simulations have been implemented on ten benchmarked datasets with three different sc
enarios (varying the population size and number of generations). The results showed that the computational efficiency of GP is highly enhanced when coupled with some standardization methods, specifically Min-Max method for scenario I and Vector method for scenario II, and scenario III. Whereas, Manhattan and Z-Score methods had the worst results for all three scenarios.
(More)