Abstract
Because of their abundance, density, and ease of practical use, single-nucleotide polymorphisms (SNPs) have become the major source of information for association gene mapping in humans. Sensible strategies for selecting practically useful SNPs are therefore required. Among the factors influencing the mapping utility of a given set of SNPs are (1) their individual diversity, (2) their haplotype structure in the population of interest, and (3) their physical distribution. We propose a strategy integrating these aspects into a single mapping utility measure, which is based upon Shannon entropy, and which maximizes the amount of information extracted from a genomic region under a Malecot model of linkage disequilibrium (LD) decay. The same utility measure has also been used to define a criterion guiding SNP discovery and rational decision-making about the continuation or termination of a mapping study. The proposed strategy performs consistently well in a data set comprising 549 German control individuals, genotyped for 136 SNPs from four genomic regions of different LD structure. Adoption of the method in practice is estimated to save up to 30% of genotyping load when compared with equidistant SNP localization or pair-wise LD minimization alone.
Similar content being viewed by others
References
Badano JL, Katsanis N (2002) Beyond Mendel: an evolving view of human genetic disease transmission. Nat Rev Genet 3:779–789
Becker T, Knapp M (2002) Efficiency of haplotype frequency estimation when nuclear familiy information is included. Hum Hered 54:45–53
Croucher PJP, Mascheretti S, Hampe J, Huse K, Frenzel H, Stoll M, Lu T, Nikolaus S, Yang SK, Krawczak M, Kim WH, Schreiber S (2003) Haplotype structure and association to Crohn's disease of CARD15 mutations in two ethnically divergent populations. Eur J Hum Genet (in press)
Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232
Douglas JA, Boehnke M, Gillanders E, Trent JM, Gruber SB (2001) Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nat Genet 28:361–364
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229
Genin E (2001) Selection of single nucleotide polymorphisms for association studies in candidate genes. Genet Epidemiol 21:S614–S619
Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237
Krawczak M, Konecki DS, Schmidtke J, Duck M, Engel W, Nutzenadel W, Trefz FK (1988) Allelic association of the cystic fibrosis locus and two DNA markers, XV2c and KM19, in 55 German families. Hum Genet 80:78–80
Kruglyak L, Daly MJ, Reeve Daly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363
Lahiri DK, Bye S, Nurnberger JI Jr, Hodes ME, Crisp M (1992) A non-organic and non-enzymatic extraction method gives higher yields of genomic DNA from whole-blood samples than do nine other methods tested. J Biochem Biophys Methods 25:193–205
McKeigue PM (2000) Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. Am J Hum Genet 67:1626–1627
Morton NE, Zhang W, Taillon-Miller P, Ennis S, Kwok PY, Collins A (2001) The optimal measure of allelic association. Proc Natl Acad Sci USA 98:5217–5221
Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723
Pritchard JK, Cox NJ (2002) The allelic architecture of human disease genes: common disease-common variant...or not? Hum Mol Genet 11:2417–2423
Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, Cohen Z, Delmonte T, Kocher K, Miller K, Guschwan S, Kulbokas EJ, O'Leary S, Winchester E, Dewar K, Green T, Stone V, Chow C, Cohen A, Langelier D, Lapointe G, Gaudet D, Faith J, Branco N, Bull SB, McLeod RS, Griffiths AM, Bitton A, Greenberg GR, Lander ES, Siminovitch KA, Hudson TJ (2001) Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nat Genet 29:223–228
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837
Schaid DJ (2002) Relative efficiency of ambiguous vs directly measured haplotype frequencies. Genet Epidemiol 23:426–443
Shannon CE (1948) A mathematical theory of communication. Bell Systems Tech J 27:379–423
Veal CD, Capon F, Allen MH, Heath EK, Evans JC, Jones A, Patel S, Burden D, Tillman D, Barker JN, Trembath RC (2002) Family-based analysis using a dense single-nucleotide polymorphism-based map defines genetic variation at PSORS1, the major psoriasis-susceptibility locus. Am J Hum Genet 71:554–564
Zhang K, Calabrese P, Nordborg M, Sun F (2002a) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394
Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002b) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339
Acknowledgements
This study was supported by the German National Genome Research Network (NGFN), the German Human Genome Project (DHGP), a GEM ("Center of Expertise in Genetic Epidemiology") grant from the German Federal Ministry of Education and Research, and a "DFG Forschergruppe" on complex disorders. The authors wish to thank Annette Stenzel and Peter Croucher for providing chromosome 6 and 16 SNP genotype data, and Timothy Lu for preparing the haplotype heat map of region 6p21.31.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hampe, J., Schreiber, S. & Krawczak, M. Entropy-based SNP selection for genetic association studies. Hum Genet 114, 36–43 (2003). https://doi.org/10.1007/s00439-003-1017-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-003-1017-2