Abstract
Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. Deciphering what genes are associated with a specific disease/disorder helps better diagnose and treat it, and may even prevent it if predicted accurately and acted on effectively at early stages. The evaluation of candidate disease-associated genes, however, requires time-consuming and expensive experiments given the large number of possibilities. Due to such challenges, computational methods have seen increasing applications in predicting gene-disease associations. Given the intertwined relationships of molecules in human cells, genes and their products can be considered to form a complex molecular interaction network. Such a network can be used to find candidate genes that share similar network properties with known disease-associated genes. In this research, we investigate autism spectrum disorders and propose a linear genetic programming algorithm for autism gene prediction using a human molecular interaction network and known autism-genes for training. We select an initial set of network properties as features and our LGP algorithm is able to find the most relevant features while evolving accurate predictive models. Our research demonstrates the powerful and flexible learning abilities of GP on tackling a significant biomedical problem, and is expected to inspire further exploration of wide GP applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Loscalzo, J., Kohane, I., Barabási, A.L.: Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol. Syst. Biol. 3(1), 124 (2007)
Griffiths, A.J., Miller, J.H., Suzuki, D.T., Lewontin, R.C., et al.: An Introduction to Genetic Analysis. WH Freeman and Company, New York (2000)
Glazier, A.M., Nadeau, J.H., Aitman, T.J.: Finding genes that underlie complex traits. Science 298(5602), 2345–2349 (2002)
Zhu, M., Zhao, S.: Candidate gene identification approach: progress and challenges. Int. J. Biol. Sci. 3(7), 420–427 (2007)
Kwon, J.M., Goate, A.M.: The candidate gene approach. Alcohol Res. Health 24(3), 164–168 (2000)
Tabor, H.K., Risch, N.J., Myers, R.M.: Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat. Rev. Genet. 3(5), 391–397 (2002)
Di Ventura, B., Lemerle, C., Michalodimitrakis, K., Serrano, L.: From in vivo to in silico biology and back. Nature 443(7111), 527–533 (2006)
Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)
Almasi, S.M., Hu, T.: Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3), e0205936 (2019)
Hu, T., Sinnott-Armstrong, N.A., Kiralis, J.W., Andrew, A.S., Karagas, M.R., Moore, J.H.: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinf. 12(1), 364 (2011)
Hu, T., et al.: An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J. Am. Med. Inf. Assoc. 20(4), 630–636 (2013)
Hu, T., Tomassini, M., Banzhaf, W.: Complex network analysis of a genetic programming phenotype network. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 49–63. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_4
Goh, K.I., Cusick, M.E., Valle, D., Childs, B., Vidal, M., Barabási, A.L.: The human disease network. Proc. Nat. Acad. Sci. 104(21), 8685–8690 (2007)
Kafaie, S., Chen, Y., Hu, T.: A network approach to prioritizing susceptibility genes for genome-wide association studies. Genet. Epidemiol. 43(5), 477–491 (2019)
Sun, K., Gonçalves, J.P., Larminie, C., Pržulj, N.: Predicting disease associations via biological network analysis. BMC Bioinf. 15(1), 304 (2014)
Ott, J.: Neural networks and disease association studies. Am. J. Med. Genet. 105(1), 60–61 (2001)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Yang, P., Li, X., Chua, H.N., Kwoh, C.K., Ng, S.K.: Ensemble positive unlabeled learning for disease gene identification. PLoS ONE 9(5), e97079 (2014)
Dorani, F., Hu, T., Woods, M.O., Zhai, G.: Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6, e5854 (2018)
Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Published via http://lulu.com (2008)
Pappa, G.L., Ochoa, G., Hyde, M.R., Freitas, A.A., Woodward, J., Swan, J.: Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms. Genet. Program. Evol. Mach. 15(1), 3–35 (2014). https://doi.org/10.1007/s10710-013-9186-9
Brameier, M., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)
Guven, A.: Linear genetic programming for time-series modelling of daily flow rate. J. Earth Syst. Sci. 118(2), 137–146 (2009)
Agapitos, A., O’Neill, M., Brabazon, A.: Adaptive distance metrics for nearest neighbour classification based on genetic programming. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 1–12. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37207-0_1
Nguyen, S., Mei, Y., Zhang, M.: Genetic programming for production scheduling: a survey with a unified framework. Complex Intell. Syst. 3(1), 41–66 (2017)
Parkins, A.D., Nandi, A.K.: Genetic programming techniques for hand written digit recognition. Signal Process. 84(12), 2345–2365 (2004)
Chen, S.H., Yeh, C.H.: Evolving traders and the business school with genetic programming: a new architecture of the agent-based artificial stock market. J. Econ. Dyn. Control 25(3–4), 363–393 (2001)
Liu, K.H., Xu, C.G.: A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics 25(3), 331–337 (2009). https://doi.org/10.1093/bioinformatics/btn644
Link, J., et al.: Application of genetic programming to high energy physics event selection. Nucl. Instrum. Methods Phys. Res., Sect. A 551(2–3), 504–527 (2005)
Hu, T., et al.: An evolutioanry learning and network approach to identifying key metabolites for osteoarthritis. PLoS Comput. Biol. 14(3), e1005986 (2018)
Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Zhai, G.: Analyzing feature importance for metabolomics using genetic programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 68–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_5
Zhang, Y., Hu, T., Liang, X., Ali, M.Z., Shabbir, M.N.S.K.: Fault detection and classification for induction motors using genetic programming. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 178–193. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_12
Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Berlin (2013)
Guo, H., Jack, L.B., Nandi, A.K.: Feature generation using genetic programming with application to fault classification. IEEE Trans. Sys. Man Cybern. Part B (Cybern.) 35(1), 89–99 (2005)
Witczak, M., Obuchowicz, A., Korbicz, J.: Genetic programming based approaches to identification and fault diagnosis of non-linear dynamic systems. Int. J. Control 75(13), 1012–1031 (2002)
Ghiassian, S.D., Menche, J., Barabasi, A.L.: A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome. PLoS Comput. Biol. 11(4), e1004120 (2015)
Menche, J., et al.: Uncovering disease-disease relationships through the incomplete interactome. Science 347(6224), 1257601 (2015)
Abrahams, B.S., et al.: FARI gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4(1), 36 (2013)
Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33(suppl-1), 514–517 (2005)
Duda, M., Zhang, H., Li, H.D., Wall, D.P., Burmeister, M., Guan, Y.: Brain-specific functional relationship networks inform autism spectrum disorder gene prediction. Trans. Psychiatry 8(1), 56 (2018)
Oughtred, R., et al.: The biogrid interaction database: 2019 update. Nucleic Acids Res. 47(D1), D529–D541 (2018)
Gleich, D.F.: Pagerank beyond the web. SIAM Rev. 57(3), 321–363 (2015)
Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003)
Newman, M.E.J.: Networks, 2nd edn. Oxford University Press, Oxford (2018)
Pržulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), e177–e183 (2007)
Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007)
Abraham, A., Ramos, V.: Web usage mining using artificial ant colony clustering and linear genetic programming. In: The 2003 Congress on Evolutionary Computation, CEC 2003, vol. 2, pp. 1384–1391. IEEE (2003)
Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 1, 37–63 (2011)
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Iossifov, I., et al.: The contribution of de novo coding mutations to autism spectrum disorder. Nature 515(7526), 216 (2014)
Fischbach, G.D., Lord, C.: The simons simplex collection: a resource for identification of autism genetic risk factors. Neuron 68(2), 192–195 (2010)
Acknowledgments
This research was supported by the Natural Science and Engineering Research Council (NSERC) of Canada Discovery Grant RGPIN-2016-04699 to TH.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Chen, Y., Hu, T. (2020). Classification of Autism Genes Using Network Science and Linear Genetic Programming. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds) Genetic Programming. EuroGP 2020. Lecture Notes in Computer Science(), vol 12101. Springer, Cham. https://doi.org/10.1007/978-3-030-44094-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-44094-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44093-0
Online ISBN: 978-3-030-44094-7
eBook Packages: Computer ScienceComputer Science (R0)