Abstract
Feature selection is a technique in statistical prediction modeling that identifies features in a record with a strong statistical connection to the target variable. Excluding features with a weak statistical connection to the target variable in training not only drops the dimension of the data, which decreases the time complexity of the algorithm, it also decreases noise within the data which assists in avoiding overfitting. In all, feature selection assists in training a robust statistical model that performs well and is stable. Recent advancements in feature selection that leverages quantum annealing (QA) give a scalable technique that aims to maximize the predictive power of the features while minimizing redundancy. As a consequence, it is expected that this algorithm would assist in the bias/variance trade-off yielding better features for training a statistical model. This paper tests this intuition against classical methods by utilizing open-source data sets and evaluates the efficacy of each trained statistical model well-known prediction algorithms. The numerical results display an advantage utilizing the features selected from the algorithm that leveraged QA.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
Albash T, Lidar DA (2018) Adiabatic quantum computation. Rev Mod Phys 90(1):015002
Albash T, Lidar DA (2018) Demonstration of a scaling advantage for a quantum annealer over simulated annealing. Phys Rev X 8(3):031016
Alhussan AA, Abdelhamid AA, El-Kenawy ESM, Ibrahim A, Eid MM, Khafaga DS, Ahmed AE (2023) A binary waterwheel plant optimization algorithm for feature selection. IEEE Access 11:94227–94251
Belhadji EB, Dionne G, Tarkhani F (2000) A model for the detection of insurance fraud. Geneva Pap Risk Insurance-Issues Pract 25(4):517–538
Bonaccorso G (2017) Machine learning algorithms. Packt Publishing Ltd
Box GE, Hunter WH, Hunter S et al (1978) Statistics for experimenters. Wiley, New York
Certo S, Vlasic A, Beaulieu D (2023) \(\alpha\) qboost: an iteratively weighted adiabatic trained classifier. Quantum Inf Process 22(12):433
Danasingh AAGS, Epiphany JL et al (2020) Identifying redundant features using unsupervised learning for high-dimensional data. SN Appl Sci 2(8):1–10
Das R, Kasieczka G, Shih D (2024) Feature selection with distance correlation. Phys Rev D 109(5):054009
Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ (2020) F-test feature selection in stacking ensemble model for breast cancer prediction. Procedia Comput Sci 171:1561–1570
Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, Marquéz JRG, Gruber B, Lafourcade B, Leitão PJ et al (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36(1):27–46
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Farhi E, Goldstone J, Gutmann S, Sipser M (2000) Quantum computation by adiabatic evolution. arXiv preprint quant-ph/0001106
Glover F, Kochenberger G, Du Y (2018) A tutorial on formulating and using qubo models. arXiv preprint arXiv:1811.11538
Grillo SA, Noguera JLV, Mello Román JC, García-Torres M, Facon J, Pinto-Roa DP, Salgueiro Romero L, Gómez-Vela F, Paniagua LRB, Correa DNL (2021) Redundancy is not necessarily detrimental in classification problems. Mathematics 9(22):2899
Gupta S, Gupta A (2019) Dealing with noise problem in machine learning data-sets: a systematic review. Procedia Comput Sci 161:466–474
Gurobi Optimization L (2022) Gurobi optimizer reference manual. https://www.gurobi.com
Jin C, Ma T, Hou R, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Chi-square statistics feature selection based on term frequency and distribution for text categorization. IETE J Res 61(4):351–362
Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 1200–1205. Ieee
Kalnins A (2018) Multicollinearity: How common factors cause type 1 errors in multivariate regression. Strateg Manag J 39(8):2362–2385
Li L, Neal RM, Zhang J (2008) A method for avoiding bias from feature selection with application to naive bayes classification models. Bayesian Anal 3(1):171–196
McGeoch C, Farre P, Bernoudy W (2020) D-wave hybrid solver service and advantage: technology update. Tech. Rep., D-Wave User Manual 09-1109A-V
McKight PE, Najab J (2010) Kruskal-wallis test. The corsini encyclopedia of psychology 1–1
Mücke S, Heese R, Müller S, Wolter M, Piatkowski N (2023) Feature selection on quantum computers. Quantum Mach Intell 5(1):11
Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71
Nanga S, Bawah AT, Acquaye BA, Billa MI, Baeta FD, Odai NA, Obeng SK, Nsiah AD (2021) Review of dimension reduction methods. J Data Anal Inf Proc 9(3):189–231
Nembrini R, Ferrari Dacrema M, Cremonesi P (2021) Feature selection for recommender systems with quantum computing. Entropy 23(8):970
Neven H, Denchev VS, Rose G, Macready WG (2012) Qboost: Large scale classifier training with adiabatic quantum optimization. In: Asian Conference on Machine Learning, pp. 333–348. PMLR (2012)
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Robinson RC (2013) Introduction to mathematical optimization. Northwestern University, Illinois US, Department of Mathematics
Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods. Eng Appl Artif Intell 100:104210
Sasdelli M, Chin TJ (2021) Quantum annealing formulation for binary neural networks. In: 2021 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–10. IEEE (2021)
Schuman C, Patton R, Potok T, et al (2019) A classical-quantum hybrid approach for unsupervised probabilistic machine learning. In: Future of Information and Communication Conference, pp. 98–117. Springer
Upton GJ (1992) Fisher’s exact test. J Royal Stat Soc: Series A (Stat Soc) 155(3):395–402
Von Dollen D, Neukart F, Weimer D, Bäck T (2021) Quantum-assisted feature selection for vehicle price prediction modeling. arXiv preprint arXiv:2104.04049
Yarkoni S, Raponi E, Bäck T, Schmitt S (2022) Quantum annealing for industry applications: Introduction and review. Reports on Progress in Physics
Zhao Z, Anand R, Wang M (2019) Maximum relevance and minimum redundancy feature selection methods for a marketing machine learning platform. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 442–452. IEEE (2019)
Author information
Authors and Affiliations
Contributions
A.V., H.G., and S.C all equally contributed to the experiments, A.V. and S.C. wrote the main manuscript text, and A.V. prepared figures 1-3. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vlasic, A., Grant, H. & Certo, S. Feature selection through quantum annealing. J Supercomput 81, 147 (2025). https://doi.org/10.1007/s11227-024-06673-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06673-x