Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3071178.3071183acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Genetic programming based feature construction for classification with incomplete data

Published: 01 July 2017 Publication History

Abstract

Missing values are an unavoidable problem in many real-world datasets. Dealing with incomplete data is an crucial requirement for classification because inadequate treatment of missing values often causes large classification error. Feature construction has been successfully applied to improve classification with complete data, but it has been seldom applied to incomplete data. Genetic programming-based multiple feature construction (GPMFC) is a current encouraging feature construction method which uses genetic programming to evolve new multiple features from original features for classification tasks. GPMFC can improve the accuracy and reduce the complexity of many decision trees and rule-based classifiers; however, it cannot directly work with incomplete data. This paper proposes IGPMFC which is extended from GPMFC to tackle with incomplete data. IGPMFC uses genetic programming with interval functions to directly evolve multiple features for classification with incomplete data. Experimental results reveal that not only IGPMFC can substantially improve the accuracy, but also can reduce the complexity of learnt classifiers facing with incomplete data.

References

[1]
A. Asuncion and D. Newman. UCI machine learning repository, 2007.
[2]
J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Science & Business Media, 2013.
[3]
A. Bifet, G. Holmes, B. Pfahringer, and E. Frank. Fast perceptron decision tree learning from evolving data streams. In Advances in knowledge discovery and data mining, pages 299--310. 2010.
[4]
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. CRC press, 1984.
[5]
S. Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in R. Journal of statistical software, 45, 2011.
[6]
P. G. Espejo, S. Ventura, and F. Herrera. A survey on the application of genetic programming to classification. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40:121--144, 2010.
[7]
A. Farhangfar, L. Kurgan, and J. Dy. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, 41:3692--3705, 2008.
[8]
P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. Pattern classification with missing data: a review. Neural Computing and Applications, 19:263--282, 2010.
[9]
J. W. Graham. Missing data analysis: Making it work in the real world. Annual review of psychology, 60:549--576, 2009.
[10]
H. Guo, Q. Zhang, and A. K. Nandi. Feature extraction and dimensionality reduction by genetic programming based on the fisher criterion. Expert Systems, 25:444--459, 2008.
[11]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11:10--18, 2009.
[12]
M. A. Hall. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999.
[13]
J. Han, M. Kamber, and J. Pei. Data mining: concepts and techniques: concepts and techniques. Elsevier, 2011.
[14]
E. Hansen and G. W. Walster. Global optimization using interval analysis: revised and expanded, volume 264. CRC Press, 2003.
[15]
J. R. Koza. Genetic programming: on the programming of computers by means of natural selection, volume 1. 1992.
[16]
Y. Lin and B. Bhanu. Evolutionary feature synthesis for object recognition. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 35:156--171, 2005.
[17]
R. J. Little and D. B. Rubin. Statistical analysis with missing data. John Wiley & Sons, 2014.
[18]
S. Luke, L. Panait, G. Balan, S. Paus, Z. Skolicki, E. Popovici, K. Sullivan, J. Harrison, J. Bassett, R. Hubley, et al. A java-based evolutionary computation research system. Online (March 2004) http://cs.gmu.edu/~eclab/projects/ecj, 2004.
[19]
M. Muharram and G. D. Smith. Evolutionary constructive induction. Knowledge and Data Engineering, IEEE Transactions on, 17:1518--1528, 2005.
[20]
D. R. Musser. Introspective sorting and selection algorithms. Softw., Pract. Exper., 27:983--993, 1997.
[21]
K. Neshatian, M. Zhang, and P. Andreae. A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. Evolutionary Computation, IEEE Transactions on, 16:645--661, 2012.
[22]
J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
[23]
H. Shi. Best-first decision tree learning. Master's thesis, University of Waikato, Hamilton, NZ, 2007. COMP594.
[24]
M. G. Smith and L. Bull. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines, 6:265--281, 2005.
[25]
A. Srinivasan and R. D. King. Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery, 3:37--57, 1999.
[26]
X. Tan, B. Bhanu, and Y. Lin. Fingerprint classification based on learned features. Systems, Man, and Cybernetics, Tart C: Applications and Reviews, IEEE Transactions on, 35:287--300, 2005.
[27]
C. T. Tran, P. Andreae, and M. Zhang. Impact of imputation of missing values on genetic programming based multiple feature construction for classification. In Evolutionary Computation (CEC), 2015 IEEE Congress on, pages 2398--2405, 2015.
[28]
C. T. Tran, M. Zhang, and P. Andreae. Multiple imputation for missing data using genetic programming. In Proceedings of the 2015 annual conference on genetic and evolutionary computation, pages 583--590, 2015.
[29]
C. T. Tran, M. Zhang, and P. Andreae. Directly evolving classifiers for missing data using genetic programming. In Evolutionary Computation (CEC), 2016 IEEE Congress on, pages 5278--5285, 2016.
[30]
C. T. Tran, M. Zhang, and P. Andreae. A genetic programming-based imputation method for classification with missing data. In European Conference on Genetic Programming, pages 149--163, 2016.
[31]
C. T. Tran, M. Zhang, P. Andreae, and B. Xue. Directly constructing multiple features for classification with missing data using genetic programming with interval functions. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pages 69--70, 2016.
[32]
I. R. White, P. Royston, and A. M. Wood. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in medicine, 30:377--399, 2011.

Cited By

View all
  • (2022)A Robust Feature Construction for Fish Classification Using Grey Wolf OptimizerCybernetics and Information Technologies10.2478/cait-2022-004522:4(152-166)Online publication date: 10-Nov-2022
  • (2022)Comparative study of classifier performance using automatic feature construction by M3GP2022 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC55065.2022.9870343(1-8)Online publication date: 18-Jul-2022
  • (2021)Improving Land Cover Classification Using Genetic Programming for Feature ConstructionRemote Sensing10.3390/rs1309162313:9(1623)Online publication date: 21-Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference
July 2017
1427 pages
ISBN:9781450349208
DOI:10.1145/3071178
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. feature construction
  3. genetic programming
  4. incomplete data

Qualifiers

  • Research-article

Conference

GECCO '17
Sponsor:

Acceptance Rates

GECCO '17 Paper Acceptance Rate 178 of 462 submissions, 39%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Robust Feature Construction for Fish Classification Using Grey Wolf OptimizerCybernetics and Information Technologies10.2478/cait-2022-004522:4(152-166)Online publication date: 10-Nov-2022
  • (2022)Comparative study of classifier performance using automatic feature construction by M3GP2022 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC55065.2022.9870343(1-8)Online publication date: 18-Jul-2022
  • (2021)Improving Land Cover Classification Using Genetic Programming for Feature ConstructionRemote Sensing10.3390/rs1309162313:9(1623)Online publication date: 21-Apr-2021
  • (2019)Recent Developments on Evolutionary Computation Techniques to Feature ConstructionIntelligent Information and Database Systems: Recent Developments10.1007/978-3-030-14132-5_9(109-122)Online publication date: 6-Mar-2019
  • (2019)Untapped Potential of Genetic Programming: Transfer Learning and Outlier RemovalGenetic Programming Theory and Practice XVI10.1007/978-3-030-04735-1_10(193-207)Online publication date: 24-Jan-2019
  • (2018)Filtering Outliers in One Step with Genetic ProgrammingParallel Problem Solving from Nature – PPSN XV10.1007/978-3-319-99253-2_17(209-222)Online publication date: 22-Aug-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media