Abstract
After the completion of Human Genome Project in 2003, it is now possible to associate genetic variations in the human genome with common and complex diseases. The current challenge now is to utilize the genomic data efficiently and to develop tools to improve our understanding of etiology of complex diseases. Many of the algorithms needed to deal with this task were originally developed in management science and operations research (OR). One application is to select a subset of the Single Nucleotide Polymorphism (SNP) biomarkers from the whole SNP set that is informative and small enough for subsequent association studies. In this paper, we present an OR application for representative SNP selection that implements our novel Simulated Annealing (SA) based feature-selection algorithm. We hope that our work will facilitate reliable identification of SNPs that are involved in the etiology of complex diseases and ultimately support timely identification of genomic disease biomarkers and the development of personalized-medicine approaches and targeted drug discoveries.
Similar content being viewed by others
References
Alazamir, S., Rebennack, S., Pardalos, P.M.: Improving the neighborhood selection strategy in simulated annealing using optimal stopping problem. In: Tan, C.M. (ed.) Global Optimization: Focus on Simulated Annealing. Energy Systems, pp. 363–382. I-Tech Education and Publication (2008)
Bafna, V., Halldorsson, B.V., Schwartz, R., Clark, A.G.: Haplotypes and informative SNP selection algorithms: don’t block out information. In: Proceedings of the Seventh International Conference on Research in Computational Molecular Biology (2003)
Daly M.J., Rioux J.D., Schaffner S.F., Hudson T.J., Lander E.S.: High resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001)
Floudas C., Pardalos P.M.: Optimization in Computational Chemistry and Molecular Biology—Local and Global Approaches. Kluwer, Dordrecht (2000)
Halperin E., Kimmel G., Shamir R.: Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 21, 195–203 (2005)
Hampe J., Schreiber S., Krawczak M.: Entropy-based SNP selection for genetic association studies. Hum. Genet. 114, 36–43 (2003)
Horne B., Camp N.J.: Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation. Genet. Epidemiol. 26, 11–21 (2004)
Howie B., Carlson C., Rieder M., Nickerson D.: Efficient selection of tagging single-nucleotide polymorphisms in multiple populations. Hum. Genet. 120, 58–68 (2006)
Ke X., Cardon L.R.: Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003)
Kirkpatrick S., Gelatt C.D., Vecchi M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Kruglyak L., Nickerson D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)
Liu G., Wang Y., Wong L.: Fasttagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium. BMC Bioinformatics 11, 66 (2010)
Liu L., Wu Y., Lonardi S., Jiang T.: Efficient genome-wide tagsnp selection across populations via the linkage disequilibrium criterion. J. Comput. Biol. (J. Computat. Mol. Cell Biol.) 17, 21–37 (2010)
Mondaini R., Pardalos P.M.: Mathematical modelling of biosystems. Springer, Berlin (2001)
Saccone S., Bolze R., Thomas P., Quan J., Mehta G., Deelman E., Tischfield J., Rice J.: Spot: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study. Nucleic Acids Res. 38, 201–209 (2010)
Shastry B.S.: SNPs in disease gene mapping, medicinal drug development and evolution. J. Hum. Genet. 52, 871–880 (2007)
Weale M.: Quality control for genome-wide association studies. Methods Mol. Biol. 628, 341–372 (2010)
Xu Z., Taylor J.: SNPinfo: integrating gwas and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 37, 600–605 (2009)
Zhang K., Qin Z., Chen T., Liu J.S., Waterman M.S., Sun F.: Hapblock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 21, 131–134 (2005)
Zhang P., Sheng H., Uehara R.: A double classification tree search algorithm for index SNP selection. BMC Bioinformatics 5, 89 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
For the Alzheimer’s Disease Neuroimaging Initiative: Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Rights and permissions
About this article
Cite this article
Üstünkar, G., Özöğür-Akyüz, S., Weber, G.W. et al. Selection of representative SNP sets for genome-wide association studies: a metaheuristic approach. Optim Lett 6, 1207–1218 (2012). https://doi.org/10.1007/s11590-011-0419-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-011-0419-7