Abstract
Exploratory data analysis is a fundamental aspect of knowledge discovery that aims to find the main characteristics of a dataset. Dimensionality reduction, such as manifold learning, is often used to reduce the number of features in a dataset to a manageable level for human interpretation. Despite this, most manifold learning techniques do not explain anything about the original features nor the true characteristics of a dataset. In this paper, we propose a genetic programming approach to manifold learning called GP-MaL which evolves functional mappings from a high-dimensional space to a lower dimensional space through the use of interpretable trees. We show that GP-MaL is competitive with existing manifold learning algorithms, while producing models that can be interpreted and re-used on unseen data. A number of promising future directions of research are found in the process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An embedding here refers to the low-dimensional representation of the structure present in a dataset.
- 2.
Here, neighbours refer to the closest instances to a point by (Euclidean) distance.
- 3.
Five inputs were found to be a good balance between encouraging wider trees and minimising computing resources required.
- 4.
Information gain (mutual information) is often used in feature selection for classification to measure the dependency between a feature and the class label.
References
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Cano, A., Ventura, S., Cios, K.J.: Multi-objective genetic programming for feature extraction and data visualization. Soft Comput. 21(8), 2069–2089 (2017)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)
Jolliffe, I.T.: Principal component analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1094–1096. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-04898-2
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Lensen, A., Xue, B., Zhang, M.: New representations in genetic programming for feature construction in k-means clustering. In: Shi, Y., et al. (eds.) SEAL 2017. LNCS, vol. 10593, pp. 543–555. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68759-9_44
Lensen, A., Xue, B., Zhang, M.: Automatically evolving difficult benchmark feature selection datasets with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 458–465. ACM (2018)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-5689-3
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)
Nguyen, S., Zhang, M., Alahakoon, D., Tan, K.C.: Visualizing the evolution of computer programs for genetic programming [research frontier]. IEEE Comput. Intell. Mag. 13(4), 77–94 (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Lulu.com, Morrisville (2008)
Rodriguez-Coayahuitl, L., Morales-Reyes, A., Escalante, H.J.: Structurally layered representation learning: towards deep learning through genetic programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 271–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_17
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Sun, Y., Xue, B., Zhang, M., Yen, G.G.: A particle swarm optimization-based flexible convolutional auto-encoder for image classification. IEEE Trans. Neural Netw. Learn. Syst. (2018). https://doi.org/10.1109/TNNLS.2018.2881143
Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. (2018). https://doi.org/10.1109/TEVC.2018.2808689
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)
Zhang, C., Liu, C., Zhang, X., Almpanidis, G.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lensen, A., Xue, B., Zhang, M. (2019). Can Genetic Programming Do Manifold Learning Too?. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2019. Lecture Notes in Computer Science(), vol 11451. Springer, Cham. https://doi.org/10.1007/978-3-030-16670-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-16670-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16669-4
Online ISBN: 978-3-030-16670-0
eBook Packages: Computer ScienceComputer Science (R0)