Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Efficiency and power in genetic association studies

Abstract

We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Distributions of the test statistic in a typical ENCODE region.
Figure 2: Efficiency afforded by a tagging approach.
Figure 3: Efficiency and power for various tagging strategies.
Figure 4: Effect of tagging from an incomplete reference panel on testing burden and power.
Figure 5: Effect of exhaustive haplotype tests on statistical power.

Similar content being viewed by others

References

  1. Wang, W.Y., Barratt, B.J., Clayton, D.G. & Todd, J.A. Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005).

    Article  CAS  Google Scholar 

  2. Carlson, C.S., Eberle, M.A., Kruglyak, L. & Nickerson, D.A. Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004).

    Article  CAS  Google Scholar 

  3. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J. & Lander, E.S. High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001).

    Article  CAS  Google Scholar 

  4. Gabriel, S.B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).

    Article  CAS  Google Scholar 

  5. Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).

    Article  CAS  Google Scholar 

  6. Johnson, G.C. et al. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29, 233–237 (2001).

    Article  CAS  Google Scholar 

  7. The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

  8. The International HapMap Consortium. A haplotype map of the human genome. Nature (in the press).

  9. Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).

    Article  CAS  Google Scholar 

  10. Stram, D.O. et al. Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003).

    Article  Google Scholar 

  11. Weale, M.E. et al. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am. J. Hum. Genet. 73, 551–565 (2003).

    Article  CAS  Google Scholar 

  12. Ke, X. & Cardon, L.R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).

    Article  CAS  Google Scholar 

  13. Meng, Z., Zaykin, D.V., Xu, C.F., Wagner, M. & Ehm, M.G. Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am. J. Hum. Genet. 73, 115–130 (2003).

    Article  CAS  Google Scholar 

  14. Carlson, C.S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).

    Article  CAS  Google Scholar 

  15. Hu, X., Schrodi, S.J., Ross, D.A. & Cargill, M. Selecting tagging SNPs for association studies using power calculations from genotype data. Hum. Hered. 57, 156–170 (2004).

    Article  CAS  Google Scholar 

  16. Halldorsson, B.V. et al. Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res. 14, 1633–1640 (2004).

    Article  CAS  Google Scholar 

  17. Ao, S.I. et al. CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 21, 1735–1736 (2005).

    Article  CAS  Google Scholar 

  18. Zhang, K. et al. HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 21, 131–134 (2005).

    Article  CAS  Google Scholar 

  19. Rinaldo, A. et al. Characterization of multilocus linkage disequilibrium. Genet. Epidemiol. 28, 193–206 (2005).

    Article  Google Scholar 

  20. Schaid, D.J., Rowland, C.M., Tines, D.E., Jacobson, R.M. & Poland, G.A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70, 425–434 (2002).

    Article  Google Scholar 

  21. Zaykin, D.V. et al. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53, 79–91 (2002).

    Article  Google Scholar 

  22. Fan, R. & Knapp, M. Genome association studies of complex diseases by case-control designs. Am. J. Hum. Genet. 72, 850–868 (2003).

    Article  CAS  Google Scholar 

  23. Stram, D.O. et al. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. 55, 179–190 (2003).

    Article  Google Scholar 

  24. Chapman, J.M., Cooper, J.D., Todd, J.A. & Clayton, D.G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).

    Article  Google Scholar 

  25. Lin, S., Chakravarti, A. & Cutler, D.J. Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nat. Genet. 36, 1181–1188 (2004).

    Article  CAS  Google Scholar 

  26. Roeder, K., Bacanu, S.A., Sonpar, V., Zhang, X. & Devlin, B. Analysis of single-locus tests to detect gene/disease associations. Genet. Epidemiol. 28, 207–219 (2005).

    Article  Google Scholar 

  27. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  Google Scholar 

  28. Nyholt, D.R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

    Article  CAS  Google Scholar 

  29. Dudbridge, F. & Koeleman, B.P. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).

    Article  CAS  Google Scholar 

  30. Wang, W.Y. & Todd, J.A. The usefulness of different density SNP maps for disease association studies of common variants. Hum. Mol. Genet. 12, 3145–3149 (2003).

    Article  CAS  Google Scholar 

  31. Goldstein, D.B., Ahmadi, K.R., Weale, M.E. & Wood, N.W. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19, 615–622 (2003).

    Article  CAS  Google Scholar 

  32. Schaffner, S.F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. (in the press).

  33. Crawford, D.C. et al. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74, 610–622 (2004).

    Article  CAS  Google Scholar 

  34. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

    Article  CAS  Google Scholar 

  35. Nejentsev, S. et al. Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene. Hum. Mol. Genet. 13, 1633–1639 (2004).

    Article  CAS  Google Scholar 

  36. Ahmadi, K.R. et al. A single-nucleotide polymorphism tagging set for human drug metabolism and transport. Nat. Genet. 37, 84–89 (2005).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank N. Patterson, E. Lander, J. Hirschhorn and S. Schaffner for discussions; J. Barrett and J. Maller for their implementation of Tagger in Haploview; the Broad Systems Group for technical assistance; and members of the Analysis group of the International HapMap Project for many useful interactions. D.A. is a Charles E. Culpeper Scholar of the Rockefeller Brothers Fund and a Burroughs Wellcome Fund Clinical Scholar in Translational Research. This work was supported by grants from the US National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mark J Daly or David Altshuler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Genotype relative risk as a function of the frequency of the causal variant. (PDF 3 kb)

Supplementary Fig. 2

Absolute power to detect association for all common causal variants as a function of the number of proxies in the complete data. (PDF 3 kb)

Supplementary Fig. 3

Exhaustive haplotype testing on tags picked from incomplete reference panels. (PDF 7 kb)

Supplementary Note

Empirical comparison of null simulations to explicit permutation testing. (PDF 100 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Bakker, P., Yelensky, R., Pe'er, I. et al. Efficiency and power in genetic association studies. Nat Genet 37, 1217–1223 (2005). https://doi.org/10.1038/ng1669

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1669

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing