Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Feature selection through quantum annealing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Feature selection is a technique in statistical prediction modeling that identifies features in a record with a strong statistical connection to the target variable. Excluding features with a weak statistical connection to the target variable in training not only drops the dimension of the data, which decreases the time complexity of the algorithm, it also decreases noise within the data which assists in avoiding overfitting. In all, feature selection assists in training a robust statistical model that performs well and is stable. Recent advancements in feature selection that leverages quantum annealing (QA) give a scalable technique that aims to maximize the predictive power of the features while minimizing redundancy. As a consequence, it is expected that this algorithm would assist in the bias/variance trade-off yielding better features for training a statistical model. This paper tests this intuition against classical methods by utilizing open-source data sets and evaluates the efficacy of each trained statistical model well-known prediction algorithms. The numerical results display an advantage utilizing the features selected from the algorithm that leveraged QA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

No datasets were generated or analysed during the current study.

Notes

  1. https://www.kaggle.com/datasets/arbazkhan971/anomaly-detection.

References

  1. Albash T, Lidar DA (2018) Adiabatic quantum computation. Rev Mod Phys 90(1):015002

    Article  MathSciNet  Google Scholar 

  2. Albash T, Lidar DA (2018) Demonstration of a scaling advantage for a quantum annealer over simulated annealing. Phys Rev X 8(3):031016

    Google Scholar 

  3. Alhussan AA, Abdelhamid AA, El-Kenawy ESM, Ibrahim A, Eid MM, Khafaga DS, Ahmed AE (2023) A binary waterwheel plant optimization algorithm for feature selection. IEEE Access 11:94227–94251

    Article  Google Scholar 

  4. Belhadji EB, Dionne G, Tarkhani F (2000) A model for the detection of insurance fraud. Geneva Pap Risk Insurance-Issues Pract 25(4):517–538

    Article  Google Scholar 

  5. Bonaccorso G (2017) Machine learning algorithms. Packt Publishing Ltd

  6. Box GE, Hunter WH, Hunter S et al (1978) Statistics for experimenters. Wiley, New York

    Google Scholar 

  7. Certo S, Vlasic A, Beaulieu D (2023) \(\alpha\) qboost: an iteratively weighted adiabatic trained classifier. Quantum Inf Process 22(12):433

    Article  MathSciNet  Google Scholar 

  8. Danasingh AAGS, Epiphany JL et al (2020) Identifying redundant features using unsupervised learning for high-dimensional data. SN Appl Sci 2(8):1–10

    Article  Google Scholar 

  9. Das R, Kasieczka G, Shih D (2024) Feature selection with distance correlation. Phys Rev D 109(5):054009

    Article  Google Scholar 

  10. Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ (2020) F-test feature selection in stacking ensemble model for breast cancer prediction. Procedia Comput Sci 171:1561–1570

    Article  Google Scholar 

  11. Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, Marquéz JRG, Gruber B, Lafourcade B, Leitão PJ et al (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1):27–46

    Article  Google Scholar 

  12. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  13. Farhi E, Goldstone J, Gutmann S, Sipser M (2000) Quantum computation by adiabatic evolution. arXiv preprint quant-ph/0001106

  14. Glover F, Kochenberger G, Du Y (2018) A tutorial on formulating and using qubo models. arXiv preprint arXiv:1811.11538

  15. Grillo SA, Noguera JLV, Mello Román JC, García-Torres M, Facon J, Pinto-Roa DP, Salgueiro Romero L, Gómez-Vela F, Paniagua LRB, Correa DNL (2021) Redundancy is not necessarily detrimental in classification problems. Mathematics 9(22):2899

    Article  Google Scholar 

  16. Gupta S, Gupta A (2019) Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput Sci 161:466–474

    Article  Google Scholar 

  17. Gurobi Optimization L (2022) Gurobi optimizer reference manual. https://www.gurobi.com

  18. Jin C, Ma T, Hou R, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Chi-square statistics feature selection based on term frequency and distribution for text categorization. IETE J Res 61(4):351–362

    Article  Google Scholar 

  19. Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 1200–1205. Ieee

  20. Kalnins A (2018) Multicollinearity: How common factors cause type 1 errors in multivariate regression. Strateg Manag J 39(8):2362–2385

    Article  Google Scholar 

  21. Li L, Neal RM, Zhang J (2008) A method for avoiding bias from feature selection with application to naive bayes classification models. Bayesian Anal 3(1):171–196

    Article  MathSciNet  Google Scholar 

  22. McGeoch C, Farre P, Bernoudy W (2020) D-wave hybrid solver service and advantage: technology update. Tech. Rep., D-Wave User Manual 09-1109A-V

  23. McKight PE, Najab J (2010) Kruskal-wallis test. The corsini encyclopedia of psychology 1–1

  24. Mücke S, Heese R, Müller S, Wolter M, Piatkowski N (2023) Feature selection on quantum computers. Quantum Mach Intell 5(1):11

    Article  Google Scholar 

  25. Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71

    Google Scholar 

  26. Nanga S, Bawah AT, Acquaye BA, Billa MI, Baeta FD, Odai NA, Obeng SK, Nsiah AD (2021) Review of dimension reduction methods. J Data Anal Inf Proc 9(3):189–231

    Google Scholar 

  27. Nembrini R, Ferrari Dacrema M, Cremonesi P (2021) Feature selection for recommender systems with quantum computing. Entropy 23(8):970

    Article  MathSciNet  Google Scholar 

  28. Neven H, Denchev VS, Rose G, Macready WG (2012) Qboost: Large scale classifier training with adiabatic quantum optimization. In: Asian Conference on Machine Learning, pp. 333–348. PMLR (2012)

  29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  30. Robinson RC (2013) Introduction to mathematical optimization. Northwestern University, Illinois US, Department of Mathematics

  31. Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods. Eng Appl Artif Intell 100:104210

    Article  Google Scholar 

  32. Sasdelli M, Chin TJ (2021) Quantum annealing formulation for binary neural networks. In: 2021 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–10. IEEE (2021)

  33. Schuman C, Patton R, Potok T, et al (2019) A classical-quantum hybrid approach for unsupervised probabilistic machine learning. In: Future of Information and Communication Conference, pp. 98–117. Springer

  34. Upton GJ (1992) Fisher’s exact test. J Royal Stat Soc: Series A (Stat Soc) 155(3):395–402

    Article  Google Scholar 

  35. Von Dollen D, Neukart F, Weimer D, Bäck T (2021) Quantum-assisted feature selection for vehicle price prediction modeling. arXiv preprint arXiv:2104.04049

  36. Yarkoni S, Raponi E, Bäck T, Schmitt S (2022) Quantum annealing for industry applications: Introduction and review. Reports on Progress in Physics

  37. Zhao Z, Anand R, Wang M (2019) Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 442–452. IEEE (2019)

Download references

Author information

Authors and Affiliations

Authors

Contributions

A.V., H.G., and S.C all equally contributed to the experiments, A.V. and S.C. wrote the main manuscript text, and A.V. prepared figures 1-3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Andrew Vlasic.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vlasic, A., Grant, H. & Certo, S. Feature selection through quantum annealing. J Supercomput 81, 147 (2025). https://doi.org/10.1007/s11227-024-06673-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06673-x

Keywords

Navigation