Nothing Special   »   [go: up one dir, main page]

Skip to main content

Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets

  • Conference paper
  • First Online:
Proceeding of the International Conference on Connected Objects and Artificial Intelligence (COCIA2024) (COCIA 2024)

Abstract

The escalating volume of data containing numerous variables poses significant challenges for machine learning tasks, necessitating effective feature selection methods. These methods play a crucial role in alleviating computational burdens, enhancing prediction accuracy, and facilitating better data understanding. This paper introduces an innovative feature selection approach that combines Markov Chain, Quadratic Mutual Information (QMI), and Random Forest techniques to address high-dimensional datasets robustly.

Our methodology utilizes a Markov Chain-based algorithm for systematic feature identification and subsequent dimensionality reduction. To address non-linear dependencies and enhance interpretability, we employ K-Means binning. QMI captures intricate non-linear relationships, contributing to a refined quadratic term. Concurrently, Random Forest regression evaluates feature importance, and the SelectFromModel technique retains a crucial subset for prediction. The integration of these techniques provides a comprehensive and robust strategy, as demonstrated through comparisons with different methods Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA) and models (LSTM, Random Forest, XGBoost).

In comparison to existing methods, our approach showcases superior model accuracy, interpretability, and efficiency in handling high-dimensional data. This versatile framework serves as a powerful tool for practitioners seeking to optimize predictive models across various domains, thereby making significant contributions to advanced analytics research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 199.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Z., Galanis, A., Goldberg, L.A., Perkins, W., Stewart, J., Vigoda, E.: Fast algorithms at low temperatures via Markov chains. Random Struct. Algorithm 58, 294 (2020)

    Article  MathSciNet  Google Scholar 

  2. Principe, J.C.: Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives. Springer, New York (2010). https://doi.org/10.1007/978-1-4419-1570-2

    Book  Google Scholar 

  3. Maddern, W., Harrison, A., Newman, P.: Lost in translation (and rotation): rapid extrinsic calibration for 2d and 3d LIDARs. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3096–3102 (2012)

    Google Scholar 

  4. Jeon, H., Oh, S.: Hybrid-recursive feature elimination for efficient feature selection. Appl. Sci. 10(9), 3211 (2020). https://doi.org/10.3390/app10093211

    Article  Google Scholar 

  5. Hamed, T., Dara, R., Kremer, S.C.: Network intrusion detection system based on recursive feature addition and bigram technique. School of Computer Science, University of Guelph, Guelph, ON, Canada. Comput. Secur. 71, 89–103 (2017)

    Google Scholar 

  6. Hu, H., van der Westhuysen, A.J., Chu, P., Fujisaki-Manome, A.: Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model 166, 101832 (2021)

    Article  Google Scholar 

  7. Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases—part II, pp. 313–325 (2008)

    Google Scholar 

  8. Tuv, E., Borisov, A., Runger, G.: Feature selectionwith ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)

    MathSciNet  Google Scholar 

  9. Vainer, I., Kraus, S., Kaminka, G.A., Slovin, H.: Obtaining scalable and accurate classification in large scale spatio-temporal domains. Knowl. Inf. Syst. 29, 527 (2010)

    Article  Google Scholar 

  10. Sun, Y., Li, J.: Iterative RELIEF for feature weighting. In: Proceedings of the 21st International Conference on Machine Learning, pp. 913–920 (2006)

    Google Scholar 

  11. Chidlovskii, B., Lecerf, L.: Scalable feature selection for multi-class problems. Mach. Learn. Knowl. Discov. Databases 5211, 227–240 (2008)

    Google Scholar 

  12. Zhang, Y., Ding, C., Li, T.: Gene selection algorithm by combining relief and MRMR. BMC Genomics 9(Suppl 2), S27 (2008)

    Article  Google Scholar 

  13. Liu, H., Liu, L., Zhang, H.: Feature selection using mutual information: an experimental study. In: Proceedings of the 10th Pacific RIM International Conference on Artificial Intelligence: Trends in Artificial Intelligence, pp. 235–246 (2008)

    Google Scholar 

  14. Xu, Z., King, I., Lyu, M.R.-T., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 21(7), 1033–1047 (2010)

    Article  Google Scholar 

  15. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)

    Google Scholar 

  16. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset. J. Expert Syst. Appl. 38(5), 5947–5957 (2011)

    Article  Google Scholar 

  17. Singh, T.: Smart Home Dataset with weather Information (2019). [online] kaggle.com. Available at: https://www.kaggle.com/taranvee/smart-home-dataset-with-weather-information

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mouad Bhih .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhih, M., Elamrani Abou Elassad, Z., El Boustani, A., El Meslouhi, O. (2024). Smart Data Simplification: A Comprehensive Feature Selection Framework for High-Dimensional Datasets. In: Mejdoub, Y., Elamri, A. (eds) Proceeding of the International Conference on Connected Objects and Artificial Intelligence (COCIA2024). COCIA 2024. Lecture Notes in Networks and Systems, vol 1123. Springer, Cham. https://doi.org/10.1007/978-3-031-70411-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70411-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70410-9

  • Online ISBN: 978-3-031-70411-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics