Abstract
In this research a new hybrid prediction algorithm for breast cancer has been made from a breast cancer data set. Many approaches are available in diagnosing the medical diseases like genetic algorithm, ant colony optimization, particle swarm optimization, cuckoo search algorithm, etc., The proposed algorithm uses a ReliefF attribute reduction with entropy based genetic algorithm for breast cancer detection. The hybrid combination of these techniques is used to handle the dataset with high dimension and uncertainties. The data are obtained from the Wisconsin breast cancer dataset; these data have been categorized based on different properties. The performance of the proposed method is evaluated and the results are compared with other well known feature selection methods. The obtained result shows that the proposed method has a remarkable ability to generate reduced-size subset of salient features while yielding significant classification accuracy for large datasets.
Similar content being viewed by others
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Goldberg, D.E.: Genetic Algorithm in Search, Optimization & Machine Learning. Addison Wesley, Reading (1989)
Kononenko, I.: Estimation attributes: analysis and Extensions of RELIEF. In: Proceedings of the 1994 European Conference on Machine Learning, pp. 171–182 (1994)
Yang, P., Zhang, Z.: An embedded two-layer feature selection approach for microarray data analysis. EEE Intell. Inf. Bull. 10(1), 24–32 (2009)
Huerta, E.B.: A Hybrid GA/SVM approach for gene selection and classification of microarray data. pp. 34–44 (2006)
Olaniyi, E.O., Oyedotun, O.K., Adnan, K.: Heart diseases diagnosis using neural networks arbitration. Int. J. Intell. Syst. Appl. (IJISA) 7(12), 75 (2015)
Hsieh, S.L., Hsieh, S.H., Cheng, P.H., et al.: Design ensemble machine learning model for breast cancer diagnosis. J. Med. Syst. 36(5), 2841–2847 (2012)
Sallehuddin, R., Ubaidillah, S.H., Mustaffa, N.H.: Classification of liver cancer using artificial neural network and support vector machine. In: Proceedings of International Conference on Advance in Communication Network, and Computing, Elsevier Science, CNC (2014)
Long, N.C., Meesad, P., Unger, H.: A highly accurate firefly based algorithm for heart disease prediction. Expert Syst. Appl. 42(21), 8221–8231 (2015)
Jabbar, M.A., Deekshatulu, B.L., Chandra, P.: Heart disease prediction system using associative classification and genetic algorithm. (2012)
Kim, J.K., Lee, J.S., Park, D.K., Lim, Y.S., Lee, Y.H., Jung, E.Y.: Adaptive mining prediction model for content recommendation to coronary heart disease patients. Clust. Comput. 17(3), 881–891 (2014)
Choubey, D.K., Sanchita, P.: GAXXSlahUndXXMLP NN: a hybrid intelligent system for diabetes disease diagnosis. Int. J. Intell. Syst. Appl. 8(1), 49 (2016)
Ordonez, C., Omiecinski, E., De Braal L. et al.: Mining constrained association rules to predict heart disease. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 433–440. San Jose, CA, USA (2001)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Ed. Leslie Pack Kaelbling. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Wang, H., Khoshgoftaar, T.M., Van Hulse, J., Gao, K.: Metric selection for software defect prediction. Int. J. Softw. Eng. Knowl. Eng. 21(2), 237–257 (2011)
Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceedings of 1997 International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858 (1997)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene experssion data. J. Bioinf. Comput. Biol. 3(2), 185–205 (2005)
Jayaram, M.A., Karegowda, A.G., Manjunath, A.S.: Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 1(7), 13–16 (2010)
Unler, A., Murat, A., Chinnam, R.B.: mr 2 PSO: a maximum relevance minimum redundancy approach based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)
Jensen, R., Shen, Q.: Fuzzy-rough data reduction with ant colony optimization. Present. Fuzzy Sets Syst. 149, 5–20 (2005)
Zhang, C.K., Hu, H.: Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1728–1732 (2005)
Liu, H., Setiono, R.: A probabilistic approach to feature selection—a filter solution. In Proceedings of the 13th International Conference on Machine Learning, pp. 319–327 (1996)
Kent ridge bio-medical data set repository World Wide Web. http://datam.i2r.a-star.edu.sg/datasets/krbd
Hualong, B., Jing, X.: Hybrid feature selection mechanism based high dimensional date sets reduction. Energy Proc. 11(1), 4973–4978 (2011)
Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm based method for feature subset selection. Soft Comput. 11(1), 111–120 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sangaiah, I., Vincent Antony Kumar, A. Improving medical diagnosis performance using hybrid feature selection via relieff and entropy based genetic search (RF-EGA) approach: application to breast cancer prediction. Cluster Comput 22 (Suppl 3), 6899–6906 (2019). https://doi.org/10.1007/s10586-018-1702-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1702-5