Abstract
Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based feature selection is a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the feature selection algorithm LAID, Logical Analysis of Inconsistent Data, is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, a problem de-composition strategy associated with a set covering problem formulation is used. The algorithm is applied to artificial datasets with genome-like characteristics of patients with rare diseases.
Similar content being viewed by others
References
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of 9th National Conference on Artificial Intelligence, pp. 547–552. MIT Press (1991)
Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12(2), 292–306 (2000)
Boyd, S., Xiao, L., Mutapcic, A., Mattingley, J.: Notes on decomposition methods. Notes for EE364B, Stanford University, pp. 1–36 (2008)
Cavique, L., Mendes, A.B., Funk, M.: Logical analysis of inconsistent data (LAID) for a paremiologic study. In: Processing 15th Portuguese Conference on Artificial Intelligence, EPIA (2011)
Cavique, L., Mendes, A.B., Funk, M., Santos, J.M.A.: A feature selection approach in the study of azorean proverbs. In: Exploring Innovative and Successful Applications of Soft Computing. Advances in Computational Intelligence and Robotics (ACIR) Book Series, pp. 38–58. IGI Global (2013)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chvatal, V.: A greedy heuristic for the set-covering problem. Math. Oper. Res. 4, 233–235 (1979)
Crama, Y., Hammer, P.L., Ibaraki, T.: Cause-effect relationships and partially defined Boolean functions. Ann. Oper. Res. 16, 299–326 (1988)
John, G.H., Kohavi, R., Pfleger. K.: Irrelevant features and the subset selection problem. In: Proceedings of 11th International Conference on Machine Learning, ICML 1994, pp. 121–129 (1994)
Joncour, C., Michel, S., Sadykov, R., Sverdlov, D., Vanderbeck, F.: Column generation based primal heuristics. Electron. Notes Discret. Math. 36, 695–702 (2010). Elsevier
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 9th National Conference on Artificial Intelligence, pp. 129–134 (1992)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 1, 341–356 (1982)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Boston (1991)
Smet, P., Ernst, A., Vanden Berghe, G.: Heuristic decomposition approaches for an integrated task scheduling and personnel rostering problem. Comput. Oper. Res. 76, 60–72 (2016)
Acknowledgements
The first author would like to thank the FCT support UID/Multi/04046/2013. This work used the EGI infrastructure with the support of NCG-INGRID-PT (Portugal) and BIFI (Spain).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cavique, L., Mendes, A.B., Martiniano, H.F.M.C. (2017). A Feature Selection Algorithm Based on Heuristic Decomposition. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-65340-2_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65339-6
Online ISBN: 978-3-319-65340-2
eBook Packages: Computer ScienceComputer Science (R0)