Abstract
Genome-wide association studies are set to become the method of choice for uncovering the genetic basis of human diseases. A central challenge in this area is the development of powerful multipoint methods that can detect causal variants that have not been directly genotyped. We propose a coherent analysis framework that treats the problem as one involving missing or uncertain genotypes. Central to our approach is a model-based imputation method for inferring genotypes at observed or unobserved SNPs, leading to improved power over existing methods for multipoint association mapping. Using real genome-wide association study data, we show that our approach (i) is accurate and well calibrated, (ii) provides detailed views of associated regions that facilitate follow-up studies and (iii) can be used to validate and correct data at genotyped markers. A notable future use of our method will be to boost power by combining data from genome-wide scans that use different SNP sets.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Zollner, S. & Pritchard, J.K. Coalescent-based association mapping and fine mapping of complex trait loci. Genetics 169, 1071–1092 (2005).
Morris, A.P. Direct analysis of unphased SNP genotype data in population based association studies via Bayesian partition modelling of haplotypes. Genet. Epidemiol. 29, 91–107 (2005).
de Bakker, P.I.W. et al. Efficiency and power in genetic association studies. Nat. Genet. 37, 1217–1223 (2005).
The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
Eyheramendy, S., Marchini, J., McVean, G., Myers, S. & Donnelly, P.A. Model-based approach to capture genetic variation for future association studies. Genome Res. 17, 88–95 (2007).
Grant, S.F.A. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).
Groves, C.J. et al. Association analysis of 6,736 U.K. subjects provides replication and confirms tcf7l2 as a type 2 diabetes susceptibility gene with a substantial effect on individual risk. Diabetes 55, 2640–2644 (2006).
Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).
Stephens, M., Smith, N.J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001).
Kraft, P., Cox, D.G., Paynter, R.A., Hunter, D. & De Vivo, I. Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques. Genet. Epidemiol. 28, 261–272 (2005).
Cordell, H.J. Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures. Genet. Epidemiol. 30, 259–275 (2006).
Elston, R.C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971).
Lander, E.S. & Green, P. Construction of multi-locus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363–2367 (1987).
Sen, S. & Churchill, G.A. A statistical framework for quantitative trait mapping. Genetics 159, 371–387 (2001).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Li, N. & Stephens, M. Modelling linkage disequilibrium, and identifying recombination hotspots using SNP data. Genetics 165, 2213–2233 (2003).
Crawford, D.C. et al. Evidence for substantial fine-scale variation in recombination rates across the human genome. Nat. Genet. 36, 700–706 (2004).
de Bakker, P.I.W. et al. Transferability of tag SNPs in genetic association studies in multiple populations. Nat. Genet. 38, 1298–1303 (2006).
Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).
Conrad, D.F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat. Genet. 38, 1251–1260 (2006).
Pritchard, J.K., Stephens, M., Rosenberg, N.A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Marchini, J., Donnelly, P. & Cardon, L.R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005).
Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Acknowledgements
We thank M. McCarthy, E. Zeggini, A. Hattersley and the WTCCC for allowing us to use the TCF7L2 data. We thank M. Stephens, B. Servin, P. De Bakker, P. Fearnhead, J. Barrett, Z. Su, C. Spencer, D. Vukcevic and N. Cardin for discussions. We acknowledge support from The Wellcome Trust, the US National Institutes of Health, the SNP Consortium, the Wolfson Foundation, the Nuffield Trust and the Engineering and Physical Sciences Research Council. B.H. is supported by a National Science Foundation Graduate Research Fellowship and the Overseas Research Students Award Scheme.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
Overall power for methods using maximum Bayes factors. (PDF 4 kb)
Supplementary Fig. 2
Overall power for methods using region Bayes factors. (PDF 4 kb)
Supplementary Fig. 3
Bayes factor plot for TCF7L2. (PDF 177 kb)
Rights and permissions
About this article
Cite this article
Marchini, J., Howie, B., Myers, S. et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, 906–913 (2007). https://doi.org/10.1038/ng2088
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng2088