Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A new multipoint method for genome-wide association studies by imputation of genotypes

Abstract

Genome-wide association studies are set to become the method of choice for uncovering the genetic basis of human diseases. A central challenge in this area is the development of powerful multipoint methods that can detect causal variants that have not been directly genotyped. We propose a coherent analysis framework that treats the problem as one involving missing or uncertain genotypes. Central to our approach is a model-based imputation method for inferring genotypes at observed or unobserved SNPs, leading to improved power over existing methods for multipoint association mapping. Using real genome-wide association study data, we show that our approach (i) is accurate and well calibrated, (ii) provides detailed views of associated regions that facilitate follow-up studies and (iii) can be used to validate and correct data at genotyped markers. A notable future use of our method will be to boost power by combining data from genome-wide scans that use different SNP sets.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Accuracy and calibration of imputed genotypes.
Figure 2: Power versus region-wide type I error for the mapping methods described in the main text, based on simulating case-control data sets conditional upon the haplotype data in the ten ENCODE regions.
Figure 3: Power versus region-wide type I error for mapping methods described in the main text, based on simulating case-control data sets conditional upon the haplotype data in the ten ENCODE regions.
Figure 4: Results of imputing SNPs in the region of the TCF7L2 gene from the WTCCC data.
Figure 5: Imputing missing data at genotyped SNPs.

Similar content being viewed by others

References

  1. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    CAS  PubMed  Google Scholar 

  2. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  3. Zollner, S. & Pritchard, J.K. Coalescent-based association mapping and fine mapping of complex trait loci. Genetics 169, 1071–1092 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Morris, A.P. Direct analysis of unphased SNP genotype data in population based association studies via Bayesian partition modelling of haplotypes. Genet. Epidemiol. 29, 91–107 (2005).

    Article  PubMed  Google Scholar 

  5. de Bakker, P.I.W. et al. Efficiency and power in genetic association studies. Nat. Genet. 37, 1217–1223 (2005).

    Article  CAS  PubMed  Google Scholar 

  6. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  7. Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Eyheramendy, S., Marchini, J., McVean, G., Myers, S. & Donnelly, P.A. Model-based approach to capture genetic variation for future association studies. Genome Res. 17, 88–95 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Grant, S.F.A. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Groves, C.J. et al. Association analysis of 6,736 U.K. subjects provides replication and confirms tcf7l2 as a type 2 diabetes susceptibility gene with a substantial effect on individual risk. Diabetes 55, 2640–2644 (2006).

    Article  CAS  PubMed  Google Scholar 

  11. Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).

    Article  CAS  PubMed  Google Scholar 

  12. Stephens, M., Smith, N.J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kraft, P., Cox, D.G., Paynter, R.A., Hunter, D. & De Vivo, I. Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques. Genet. Epidemiol. 28, 261–272 (2005).

    Article  PubMed  Google Scholar 

  14. Cordell, H.J. Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet. Epidemiol. 30, 259–275 (2006).

    Article  PubMed  Google Scholar 

  15. Elston, R.C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971).

    Article  CAS  PubMed  Google Scholar 

  16. Lander, E.S. & Green, P. Construction of multi-locus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363–2367 (1987).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Sen, S. & Churchill, G.A. A statistical framework for quantitative trait mapping. Genetics 159, 371–387 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Li, N. & Stephens, M. Modelling linkage disequilibrium, and identifying recombination hotspots using SNP data. Genetics 165, 2213–2233 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Crawford, D.C. et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36, 700–706 (2004).

    Article  CAS  PubMed  Google Scholar 

  21. de Bakker, P.I.W. et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nat. Genet. 38, 1298–1303 (2006).

    Article  CAS  PubMed  Google Scholar 

  22. Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Conrad, D.F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat. Genet. 38, 1251–1260 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  26. Marchini, J., Donnelly, P. & Cardon, L.R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005).

    Article  CAS  PubMed  Google Scholar 

  27. Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank M. McCarthy, E. Zeggini, A. Hattersley and the WTCCC for allowing us to use the TCF7L2 data. We thank M. Stephens, B. Servin, P. De Bakker, P. Fearnhead, J. Barrett, Z. Su, C. Spencer, D. Vukcevic and N. Cardin for discussions. We acknowledge support from The Wellcome Trust, the US National Institutes of Health, the SNP Consortium, the Wolfson Foundation, the Nuffield Trust and the Engineering and Physical Sciences Research Council. B.H. is supported by a National Science Foundation Graduate Research Fellowship and the Overseas Research Students Award Scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Donnelly.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Overall power for methods using maximum Bayes factors. (PDF 4 kb)

Supplementary Fig. 2

Overall power for methods using region Bayes factors. (PDF 4 kb)

Supplementary Fig. 3

Bayes factor plot for TCF7L2. (PDF 177 kb)

Supplementary Methods (PDF 148 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchini, J., Howie, B., Myers, S. et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, 906–913 (2007). https://doi.org/10.1038/ng2088

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng2088

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing