Abstract
Breast cancer is the most deadly cancer and has highest mortality rate in women all over the world. Early prediction of breast cancer can improve the survival rate of the patient. Consequently, high accuracy in cancer prediction is important to avoid any mis-diagnosis. Machine learning algorithms can contribute in early prediction and diagnosis of breast cancer. In this study, we have used rough set based feature selector to extract relevant features from the breast cancer feature set and classify them using machine learning algorithm like Decision Tree, Naive Bayes, Support Vector Machine, K-Nearest Neighbor, Logistic Regression, Random Forest, Adaboost. The main aim is to predict cancerous breast nodules, using rough set driven feature selection and machine learning classification algorithms. The results were evaluated pertaining to accuracy, sensitivity and specificity and positive predictive value. It is observed that random forest outperformed all other classifiers and achieved the highest accuracy using the proposed approach (95.23%).
Similar content being viewed by others
Data Availability
The datasets generated during the current study are available from the corresponding author on request.
Code Availability
The PYTHON codes used during the current study are available from the corresponding author on request.
References
Kumari, V., Ahmed, A., Kanumuri, T., Shakher, C., & Sheoran, G. (2020). Early detection of cancerous tissues in human breast utilizing near field microwave holography. International Journal of Imaging Systems and Technology, 30, 391–400. https://doi.org/10.1002/ima.22384
Martinez-del-Rincon, J., Santofimia, M. J., del Toro, X., et al. (2017). Nonlinear classifiers applied to EEG analysis for epilepsy seizure detection. Expert Systems with Applications, 86, 99–112.
Labrèche, F., Goldberg, M.S., Hashim, D., Weiderpass, E. (2020). Breast cancer. In Occupational Cancers, Springer, Berlin/Heidelberg, Germany, pp. 417–438
Kumar, V., Misha, B.K., Mazzara, M., Thanh, D.N., Verma, A. (2019) Prediction of malignant and benign breast cancer: A data mining approach in healthcare applications. In Advances in Data Science and Management, Springer, Berlin/Heidelberg, Germany, , pp. 435–442
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal For Clinicians, 68(6), 394–424.
Melekoodappattu, J. G., & Subbian, P. S. (2019). A hybridized ELM for automatic micro calcification detection in mammogram images based on multi-scale features. Journal of medical systems, 43(7), 183. https://doi.org/10.1007/s10916-019-1316-3
Parsian, A., Ramezani, M., & Ghadimi, N. (2017). A hybrid neural network gray wolf optimization algorithm for melanoma detection. Biomedical Research, 28(8), 3408–3411.
Luque, C., Luna, J. M., Luque, M., & Ventura, S. (2019). An advanced review on text mining in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1302.
Hassan, M., & Hamada, M. (2018). Genetic algorithm approaches for improving prediction accuracy of multi-criteria recommender systems. International Journal of Computational Intelligence Systems, 11(1), 146–162.
Tanimu, J.J., Hamada, M., Hassan, M., Yusuf, S.I. (2021) A contemporary machine learning method for accurate prediction of cervical cancer. In Proceedings of the 3rd ETLT 2021. ACM International Conference on Information and Communication Technology, Aizu, Japan, p. 04004
Abba, A.H., Hassan, M., (2018) Design and implementation of a CSV validation system. In Proceedings of the 3rd international Conference on Applications in information Technology, Wakamatsu, Japan, pp. 111–116
Osianwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinimikaiye, J. O., Olakanmi, O., & Akiniobi, J. (2017). Supervised machine learning algorithm: Classification and comparisiom. International Journal of Computer Trends and Technology, 3, 128–138.
Bazazeh, D., Shubair, R. (2017) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In Proceedings of the 2017 International Conference on Electronic Devices, Systems, and Applications, Kuching, Malaysia, pp. 2–5
Boeri, C., Chiappa, C., Galli, F., de Berardinis, V., Bardelli, L., Carcano, G., & Rovera, F. (2020). Machine learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Medicine, 9, 3234–3243.
Sakri, S. B., Rashid, N. B. A., & Zain, Z. M. (2018). Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access, 6, 29637–29647.
Ni, Q., Stevic, I., Pan, C., et al. (2018). Different signatures of miR-16, miR-30b and miR-93 in exosomes from breast cancer and DCIS patients. Science and Reports, 8(1), 12974.
Ricciardi, C., Valente, S. A., Edmund, K., Cantoni, V., Green, R., Fiorillo, A., Picone, I., Santini, S., & Cesarelli, M. (2020). Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Informatics Journal, 26, 2181–2192.
Bader Alazzam, M., Mansour, H., Hammam, M. M., et al. (2021). machine learning of medical applications involving complicated proteins and genetic measurements. Computational Intelligence and Neuroscience, 2021, 1–6.
Dhanya, R., Paul, I. R., Sindhu Akula, S., Sivakumar, M., & Nair J. J. (2019) A comparative study for breast cancer prediction using machine learning and feature selection. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1049–1055
Islam, M. M., Iqbal, H., Haque, M. R., & Hasan, M. K. (2017) Prediction of breast cancer using support vector machine and K-Nearest neighbors. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), pp. 226–229
MurtiRawat, R., Panchal, S., Singh, V. K., & Panchal, Y. (2020). Breast cancer detection using k-nearest neighbors, logistic regression and ensemble learning. International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, 534–540. https://doi.org/10.1109/ICESC48915.2020.9155783
Bazazeh, D., & Shubair, R. (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), pp. 1–4
Jain, R., & Mazumdar, J. (2003). A genetic algorithm based nearest neighbor classification to breast cancer diagnosis. Australasian Physical and Engineering Sciences in Medicine, 26, 6.
Aličković, E., & Subasi, A. (2015). Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and Applications, 28, 753–763.
Zhao, Z., Li, X., Luan, B., Jiang, W., & Gao, W. (2023). Secure internet of things (IoT) using a novel brooks iyengar quantum byzantine agreement-centered lockchain networking (BIQBA-BCN) model in smart healthcare. Information Sciences. https://doi.org/10.1016/j.ins.2023.01.020
daoudyvan, A., & Maalmi, K. (2020). Breast cancer classification with reduced feature set using association rules and support vector machine. Network Modeling Analysis in Health Informatics and Bioinformatics, 9, 34.
Kavitha, T., Mathai, P. P., Karthikeyan, C., et al. (2021). Deep learning based capsule neural network model for breast cancer diagnosis using mammogram images. Interdisciplinary Sciences: Computational Life Sciences. https://doi.org/10.1007/s12539-021-00467-y
El Rahman, S. A. (2021). Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study. Journal of Ambient Intelligence and Humanized Computing, 12, 8585–8623.
Kamel, S. R., YaghoubZadeh, R., & Kheirabadi, M. (2019). Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer. Journal of Big Data, 6, 90.
Partheepan, R., Walia, R., & Chandra Shekar Rao, V. (2022). Multilayer stacked probabilistic belief network-based brain tumor segmentation and classification. International Journal of Foundations of Computer Science. https://doi.org/10.1142/S0129054122420047
Sharma, A., & Mishra, P. K. (2021). Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. International Journal of Information Technology, 14(4), 1949–1960.
Hu, Q., Whitney, H. M., & Giger, M. L. (2020). A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Science and Reports, 10(1), 1–11.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhukya, H., Sadanandam, M. RoughSet based Feature Selection for Prediction of Breast Cancer. Wireless Pers Commun 130, 2197–2214 (2023). https://doi.org/10.1007/s11277-023-10378-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-023-10378-4