Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets

Mouad Bhih¹¹,
Zouhair Elamrani Abou Elassad¹¹,
Abdelhakim El Boustani¹¹ &
…
Othmane El Meslouhi¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1123))

Included in the following conference series:

International Conference on Connected Objects and Artificial Intelligence

70 Accesses

Abstract

The escalating volume of data containing numerous variables poses significant challenges for machine learning tasks, necessitating effective feature selection methods. These methods play a crucial role in alleviating computational burdens, enhancing prediction accuracy, and facilitating better data understanding. This paper introduces an innovative feature selection approach that combines Markov Chain, Quadratic Mutual Information (QMI), and Random Forest techniques to address high-dimensional datasets robustly.

Our methodology utilizes a Markov Chain-based algorithm for systematic feature identification and subsequent dimensionality reduction. To address non-linear dependencies and enhance interpretability, we employ K-Means binning. QMI captures intricate non-linear relationships, contributing to a refined quadratic term. Concurrently, Random Forest regression evaluates feature importance, and the SelectFromModel technique retains a crucial subset for prediction. The integration of these techniques provides a comprehensive and robust strategy, as demonstrated through comparisons with different methods Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA) and models (LSTM, Random Forest, XGBoost).

In comparison to existing methods, our approach showcases superior model accuracy, interpretability, and efficiency in handling high-dimensional data. This versatile framework serves as a powerful tool for practitioners seeking to optimize predictive models across various domains, thereby making significant contributions to advanced analytics research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Unsupervised Feature Selection Using Correlation Score

Powershap: A Power-Full Shapley Feature Selection Method

References

Chen, Z., Galanis, A., Goldberg, L.A., Perkins, W., Stewart, J., Vigoda, E.: Fast algorithms at low temperatures via Markov chains. Random Struct. Algorithm 58, 294 (2020)
Article MathSciNet Google Scholar
Principe, J.C.: Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-1570-2
Book Google Scholar
Maddern, W., Harrison, A., Newman, P.: Lost in translation (and rotation): rapid extrinsic calibration for 2d and 3d LIDARs. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3096–3102 (2012)
Google Scholar
Jeon, H., Oh, S.: Hybrid-recursive feature elimination for efficient feature selection. Appl. Sci. 10(9), 3211 (2020). https://doi.org/10.3390/app10093211
Article Google Scholar
Hamed, T., Dara, R., Kremer, S.C.: Network intrusion detection system based on recursive feature addition and bigram technique. School of Computer Science, University of Guelph, Guelph, ON, Canada. Comput. Secur. 71, 89–103 (2017)
Google Scholar
Hu, H., van der Westhuysen, A.J., Chu, P., Fujisaki-Manome, A.: Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model 166, 101832 (2021)
Article Google Scholar
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases—part II, pp. 313–325 (2008)
Google Scholar
Tuv, E., Borisov, A., Runger, G.: Feature selectionwith ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)
MathSciNet Google Scholar
Vainer, I., Kraus, S., Kaminka, G.A., Slovin, H.: Obtaining scalable and accurate classification in large scale spatio-temporal domains. Knowl. Inf. Syst. 29, 527 (2010)
Article Google Scholar
Sun, Y., Li, J.: Iterative RELIEF for feature weighting. In: Proceedings of the 21st International Conference on Machine Learning, pp. 913–920 (2006)
Google Scholar
Chidlovskii, B., Lecerf, L.: Scalable feature selection for multi-class problems. Mach. Learn. Knowl. Discov. Databases 5211, 227–240 (2008)
Google Scholar
Zhang, Y., Ding, C., Li, T.: Gene selection algorithm by combining relief and MRMR. BMC Genomics 9(Suppl 2), S27 (2008)
Article Google Scholar
Liu, H., Liu, L., Zhang, H.: Feature selection using mutual information: an experimental study. In: Proceedings of the 10th Pacific RIM International Conference on Artificial Intelligence: Trends in Artificial Intelligence, pp. 235–246 (2008)
Google Scholar
Xu, Z., King, I., Lyu, M.R.-T., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)
Article Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)
Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset. J. Expert Syst. Appl. 38(5), 5947–5957 (2011)
Article Google Scholar
Singh, T.: Smart Home Dataset with weather Information (2019). [online] kaggle.com. Available at: https://www.kaggle.com/taranvee/smart-home-dataset-with-weather-information

Download references

Author information

Authors and Affiliations

Computer Science Department, SARS Research Team ENSAS, Cadi Ayyad University, Safi, Morocco
Mouad Bhih, Zouhair Elamrani Abou Elassad, Abdelhakim El Boustani & Othmane El Meslouhi

Authors

Mouad Bhih
View author publications
You can also search for this author in PubMed Google Scholar
Zouhair Elamrani Abou Elassad
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhakim El Boustani
View author publications
You can also search for this author in PubMed Google Scholar
Othmane El Meslouhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mouad Bhih .

Editor information

Editors and Affiliations

Superior School of Technology (ESTC), University of Hassan II Casablanca, casablanca, Morocco
Youssef Mejdoub
Superior School of Technology (ESTC), University of Hassan II Casablanca, Casablanca, Morocco
Abdelkebir Elamri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhih, M., Elamrani Abou Elassad, Z., El Boustani, A., El Meslouhi, O. (2024). Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets. In: Mejdoub, Y., Elamri, A. (eds) Proceeding of the International Conference on Connected Objects and Artificial Intelligence (COCIA2024). COCIA 2024. Lecture Notes in Networks and Systems, vol 1123. Springer, Cham. https://doi.org/10.1007/978-3-031-70411-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-70411-6_28
Published: 13 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70410-9
Online ISBN: 978-3-031-70411-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Unsupervised Feature Selection Using Correlation Score

Powershap: A Power-Full Shapley Feature Selection Method

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Unsupervised Feature Selection Using Correlation Score

Powershap: A Power-Full Shapley Feature Selection Method

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation