Abstract
The escalating volume of data containing numerous variables poses significant challenges for machine learning tasks, necessitating effective feature selection methods. These methods play a crucial role in alleviating computational burdens, enhancing prediction accuracy, and facilitating better data understanding. This paper introduces an innovative feature selection approach that combines Markov Chain, Quadratic Mutual Information (QMI), and Random Forest techniques to address high-dimensional datasets robustly.
Our methodology utilizes a Markov Chain-based algorithm for systematic feature identification and subsequent dimensionality reduction. To address non-linear dependencies and enhance interpretability, we employ K-Means binning. QMI captures intricate non-linear relationships, contributing to a refined quadratic term. Concurrently, Random Forest regression evaluates feature importance, and the SelectFromModel technique retains a crucial subset for prediction. The integration of these techniques provides a comprehensive and robust strategy, as demonstrated through comparisons with different methods Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA) and models (LSTM, Random Forest, XGBoost).
In comparison to existing methods, our approach showcases superior model accuracy, interpretability, and efficiency in handling high-dimensional data. This versatile framework serves as a powerful tool for practitioners seeking to optimize predictive models across various domains, thereby making significant contributions to advanced analytics research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, Z., Galanis, A., Goldberg, L.A., Perkins, W., Stewart, J., Vigoda, E.: Fast algorithms at low temperatures via Markov chains. Random Struct. Algorithm 58, 294 (2020)
Principe, J.C.: Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-1570-2
Maddern, W., Harrison, A., Newman, P.: Lost in translation (and rotation): rapid extrinsic calibration for 2d and 3d LIDARs. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3096–3102 (2012)
Jeon, H., Oh, S.: Hybrid-recursive feature elimination for efficient feature selection. Appl. Sci. 10(9), 3211 (2020). https://doi.org/10.3390/app10093211
Hamed, T., Dara, R., Kremer, S.C.: Network intrusion detection system based on recursive feature addition and bigram technique. School of Computer Science, University of Guelph, Guelph, ON, Canada. Comput. Secur. 71, 89–103 (2017)
Hu, H., van der Westhuysen, A.J., Chu, P., Fujisaki-Manome, A.: Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model 166, 101832 (2021)
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases—part II, pp. 313–325 (2008)
Tuv, E., Borisov, A., Runger, G.: Feature selectionwith ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
Vainer, I., Kraus, S., Kaminka, G.A., Slovin, H.: Obtaining scalable and accurate classification in large scale spatio-temporal domains. Knowl. Inf. Syst. 29, 527 (2010)
Sun, Y., Li, J.: Iterative RELIEF for feature weighting. In: Proceedings of the 21st International Conference on Machine Learning, pp. 913–920 (2006)
Chidlovskii, B., Lecerf, L.: Scalable feature selection for multi-class problems. Mach. Learn. Knowl. Discov. Databases 5211, 227–240 (2008)
Zhang, Y., Ding, C., Li, T.: Gene selection algorithm by combining relief and MRMR. BMC Genomics 9(Suppl 2), S27 (2008)
Liu, H., Liu, L., Zhang, H.: Feature selection using mutual information: an experimental study. In: Proceedings of the 10th Pacific RIM International Conference on Artificial Intelligence: Trends in Artificial Intelligence, pp. 235–246 (2008)
Xu, Z., King, I., Lyu, M.R.-T., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset. J. Expert Syst. Appl. 38(5), 5947–5957 (2011)
Singh, T.: Smart Home Dataset with weather Information (2019). [online] kaggle.com. Available at: https://www.kaggle.com/taranvee/smart-home-dataset-with-weather-information
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhih, M., Elamrani Abou Elassad, Z., El Boustani, A., El Meslouhi, O. (2024). Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets. In: Mejdoub, Y., Elamri, A. (eds) Proceeding of the International Conference on Connected Objects and Artificial Intelligence (COCIA2024). COCIA 2024. Lecture Notes in Networks and Systems, vol 1123. Springer, Cham. https://doi.org/10.1007/978-3-031-70411-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-70411-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70410-9
Online ISBN: 978-3-031-70411-6
eBook Packages: EngineeringEngineering (R0)