Predictable Features Elimination: An Unsupervised Approach to Feature Selection

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13163))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1682 Accesses

Abstract

We propose an unsupervised, model-agnostic, wrapper method for feature selection. We assume that if a feature can be predicted using the others, it adds little information to the problem, and therefore could be removed without impairing the performance of whatever model will be eventually built. The proposed method iteratively identifies and removes predictable, or nearly-predictable, redundant features, allowing to trade-off complexity with expected quality. The approach do not rely on target labels nor values, and the model used to identify predictable features is not related to the final use of the feature set. Therefore, it can be used for supervised, unsupervised, or semi-supervised problems, or even as a safe, pre-processing step to improve the quality of the results of other feature selection techniques. Experimental results against state-of-the-art feature-selection algorithms show satisfying performance on several non-trivial benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stochastic and Non-Stochastic Feature Selection

Powershap: A Power-Full Shapley Feature Selection Method

Unsupervised Feature Value Selection Based on Explainability

Notes

1.
https://github.com/glubbdubdrib/predictable-feature-elimination.

References

Barbiero, P., Lutton, E., Squillero, G., Tonda, A.: A novel outlook on feature selection as a multi-objective problem. In: Idoumghar, L., Legrand, P., Liefooghe, A., Lutton, E., Monmarché, N., Schoenauer, M. (eds.) EA 2019. LNCS, vol. 12052, pp. 68–81. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45715-0_6
Chapter Google Scholar
Barbiero, P., Squillero, G., Tonda, A.: Modeling generalization in machine learning: a methodological and computational study. arXiv preprint arXiv:2006.15680 (2020)
Bermingham, M., et al.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015). https://doi.org/10.1038/srep10312
Article Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)
Google Scholar
Chien, Y., Fu, K.S.: On the generalized Karhunen-Loéve expansion. IEEE Trans. Inf. Theor. 13(3), 518–520 (1967)
Article MATH Google Scholar
Cilia, N.D., De Stefano, C., Fontanella, F., Scotto di Freca, A.: Variable-length representation for EC-based feature selection in high-dimensional data. In: Kaufmann, P., Castillo, P.A. (eds.) EvoApplications 2019. LNCS, vol. 11454, pp. 325–340. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16692-2_22
Chapter Google Scholar
Erickson, N., et al.: AutoGluon-Tabular: robust and accurate AutoML for structured data. arXiv preprint arXiv:2003.06505 (2020)
Fanty, M., Cole, R.: Spoken letter recognition. In: Advances in Neural Information Processing Systems, pp. 220–226 (1991)
Google Scholar
Fisher, R.A.: XV.-The correlation between relatives on the supposition of mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinburgh 52(2), 399–433 (1919)
Google Scholar
Guyon, I.: Design of experiments of the NIPS 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection (2003)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F.: Multi-objective feature selection with NSGA II. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 240–247. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71618-1_27
Chapter Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006)
Google Scholar
Kozachenko, L., Leonenko, N.N.: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)
MathSciNet MATH Google Scholar
Lewis, P.: The characteristic selection problem in recognition systems. IRE Trans. inf. Theor. 8(2), 171–178 (1962)
Article MATH Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Article Google Scholar
Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: 26th AAAI Conference on Artificial Intelligence (2012)
Google Scholar
Pedregosa, F., et al.: scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Steel, R.G.D., Torrie, J.H., et al.: Principles and Procedures of Statistics (1960)
Google Scholar
Steinhaus, H.: Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci. 1(804), 801 (1956)
MathSciNet MATH Google Scholar
Tsai, F.S.: Dimensionality reduction for computer facial animation. Exp. Syst. Appl. 39(5), 4965–4971 (2012). https://doi.org/10.1016/j.eswa.2011.10.018
Article Google Scholar
Turner, M.C., Krewski, D., Pope, C.A., III., Chen, Y., Gapstur, S.M., Thun, M.J.: Long-term ambient fine particulate matter air pollution and lung cancer in a large cohort of never-smokers. Am. J. Respir. Crit. Care Med. 184(12), 1374–1381 (2011)
Article Google Scholar
Van Rijsbergen, C.J.: Information Retrieval. 2nd edn. Butterworth-Heinemann, Newton, MA (1979)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuators B Chem. 166, 320–329 (2012)
Article Google Scholar
Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Exp. Syst. Appl. 40(13), 5077–5084 (2013)
Article Google Scholar
Ward, J.H., Jr.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems 13, pp. 668–674. MIT Press (2000)
Google Scholar
Xue, B., Fu, W., Zhang, M.: Multi-objective feature selection in classification: a differential evolution approach. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 516–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13563-2_44
Chapter Google Scholar
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015)
Article Google Scholar
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised. In: 22nd International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157 (2007)
Google Scholar
Zhou, Z., Li, S., Qin, G., Folkert, M., Jiang, S., Wang, J.: Multi-objective based radiomic feature selection for lesion malignancy classification. IEEE J. Biomed. Health Inform. 24, 194–204 (2019)
Article Google Scholar
Zill, D., Wright, W.S., Cullen, M.R.: Advanced Engineering Mathematics. Jones & Bartlett Learning (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Cambridge University, Cambridge, UK
Pietro Barbiero
Politecnico di Torino, Torino, Italy
Giovanni Squillero
UMR 518 MIA, INRAE, Paris, France
Alberto Tonda

Authors

Pietro Barbiero
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Squillero
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Tonda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Squillero .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
Department of Computer Science, University of Reading, Reading, UK
Varun Ojha
Department of Computer Science, University of Oxford, Oxford, UK
Emanuele La Malfa
Cambridge Judge Business School, University of Cambridge, Cambridge, UK
Gabriele La Malfa
Department of Biochemistry, University of Cambridge, Cambridge, UK
Giorgio Jansen
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Department of Informatics, Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbiero, P., Squillero, G., Tonda, A. (2022). Predictable Features Elimination: An Unsupervised Approach to Feature Selection. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-95467-3_29
Published: 02 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95466-6
Online ISBN: 978-3-030-95467-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predictable Features Elimination: An Unsupervised Approach to Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic and Non-Stochastic Feature Selection

Powershap: A Power-Full Shapley Feature Selection Method

Unsupervised Feature Value Selection Based on Explainability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Predictable Features Elimination: An Unsupervised Approach to Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic and Non-Stochastic Feature Selection

Powershap: A Power-Full Shapley Feature Selection Method

Unsupervised Feature Value Selection Based on Explainability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation