Abstract
We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wang, W.Y., Barratt, B.J., Clayton, D.G. & Todd, J.A. Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005).
Carlson, C.S., Eberle, M.A., Kruglyak, L. & Nickerson, D.A. Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004).
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. & Lander, E.S. High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001).
Gabriel, S.B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).
Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).
Johnson, G.C. et al. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29, 233–237 (2001).
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
The International HapMap Consortium. A haplotype map of the human genome. Nature (in the press).
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
Stram, D.O. et al. Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003).
Weale, M.E. et al. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am. J. Hum. Genet. 73, 551–565 (2003).
Ke, X. & Cardon, L.R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).
Meng, Z., Zaykin, D.V., Xu, C.F., Wagner, M. & Ehm, M.G. Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet. 73, 115–130 (2003).
Carlson, C.S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).
Hu, X., Schrodi, S.J., Ross, D.A. & Cargill, M. Selecting tagging SNPs for association studies using power calculations from genotype data. Hum. Hered. 57, 156–170 (2004).
Halldorsson, B.V. et al. Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res. 14, 1633–1640 (2004).
Ao, S.I. et al. CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21, 1735–1736 (2005).
Zhang, K. et al. HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 21, 131–134 (2005).
Rinaldo, A. et al. Characterization of multilocus linkage disequilibrium. Genet. Epidemiol. 28, 193–206 (2005).
Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M. & Poland, G.A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70, 425–434 (2002).
Zaykin, D.V. et al. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53, 79–91 (2002).
Fan, R. & Knapp, M. Genome association studies of complex diseases by case-control designs. Am. J. Hum. Genet. 72, 850–868 (2003).
Stram, D.O. et al. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. 55, 179–190 (2003).
Chapman, J.M., Cooper, J.D., Todd, J.A. & Clayton, D.G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).
Lin, S., Chakravarti, A. & Cutler, D.J. Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nat. Genet. 36, 1181–1188 (2004).
Roeder, K., Bacanu, S.A., Sonpar, V., Zhang, X. & Devlin, B. Analysis of single-locus tests to detect gene/disease associations. Genet. Epidemiol. 28, 207–219 (2005).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Nyholt, D.R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
Dudbridge, F. & Koeleman, B.P. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).
Wang, W.Y. & Todd, J.A. The usefulness of different density SNP maps for disease association studies of common variants. Hum. Mol. Genet. 12, 3145–3149 (2003).
Goldstein, D.B., Ahmadi, K.R., Weale, M.E. & Wood, N.W. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19, 615–622 (2003).
Schaffner, S.F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. (in the press).
Crawford, D.C. et al. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74, 610–622 (2004).
Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Nejentsev, S. et al. Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene. Hum. Mol. Genet. 13, 1633–1639 (2004).
Ahmadi, K.R. et al. A single-nucleotide polymorphism tagging set for human drug metabolism and transport. Nat. Genet. 37, 84–89 (2005).
Acknowledgements
We thank N. Patterson, E. Lander, J. Hirschhorn and S. Schaffner for discussions; J. Barrett and J. Maller for their implementation of Tagger in Haploview; the Broad Systems Group for technical assistance; and members of the Analysis group of the International HapMap Project for many useful interactions. D.A. is a Charles E. Culpeper Scholar of the Rockefeller Brothers Fund and a Burroughs Wellcome Fund Clinical Scholar in Translational Research. This work was supported by grants from the US National Institutes of Health.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
Genotype relative risk as a function of the frequency of the causal variant. (PDF 3 kb)
Supplementary Fig. 2
Absolute power to detect association for all common causal variants as a function of the number of proxies in the complete data. (PDF 3 kb)
Supplementary Fig. 3
Exhaustive haplotype testing on tags picked from incomplete reference panels. (PDF 7 kb)
Supplementary Note
Empirical comparison of null simulations to explicit permutation testing. (PDF 100 kb)
Rights and permissions
About this article
Cite this article
de Bakker, P., Yelensky, R., Pe'er, I. et al. Efficiency and power in genetic association studies. Nat Genet 37, 1217–1223 (2005). https://doi.org/10.1038/ng1669
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1669