Abstract
Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Comparison of multiple imputation and other methods for the analysis of imputed genotypes
BMC Genomics Open Access 06 June 2023
-
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Genetics Selection Evolution Open Access 11 May 2023
-
Genetic correlations between Alzheimer’s disease and gut microbiome genera
Scientific Reports Open Access 31 March 2023
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout

References
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).
Gudbjartsson, D.F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).
Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).
Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
Pistis, G. et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).
Fuchsberger, C., Abecasis, G.R. & Hinds, D.A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Cohen, J.C., Boerwinkle, E., Mosley, T.H. Jr. & Hobbs, H.H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).
Stitziel, N.O. et al. Inactivating mutations in NPC1L1 and protection from coronary heart disease. N. Engl. J. Med. 371, 2072–2082 (2014).
Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. http://dx.doi.org/10.1038/ng.3643 (2016).
Pritchard, J.K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
Browning, B.L. & Browning, S.R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Delaneau, O., Marchini, J. & Zagury, J.F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).
Delaneau, O., Zagury, J.F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Paul, J.S. & Song, Y.S. Blockwise HMM computation for large-scale population genomic inference. Bioinformatics 28, 2008–2015 (2012).
Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).
Markianos, K., Daly, M.J. & Kruglyak, L. Efficient multipoint linkage analysis through reduction of inheritance space. Am. J. Hum. Genet. 68, 963–977 (2001).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Dean, J. & Ghemawat, S. Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008).
Schönherr, S. et al. Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds. BMC Bioinformatics 13, 200 (2012).
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Plagnol, V. & Wall, J.D. Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).
Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Baum, L.E., Petrie, T., Soules, G. & Weiss, N. A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
Fritsche, L.G. et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).
Vrieze, S.I. et al. In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes. Psychophysiology 51, 1309–1320 (2014).
Williams, A.L., Patterson, N., Glessner, J., Hakonarson, H. & Reich, D. Phasing of many thousands of genotyped samples. Am. J. Hum. Genet. 91, 238–251 (2012).
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
Acknowledgements
The authors gratefully acknowledge D. Hinds for assistance with minimac3 code optimizations and A.L. Williams for providing HAPI-UR. We acknowledge support from National Institutes of Health grants HG007022 and HL117626 (G.R.A.), HG000376 (M.B.), and R01DA037904 (S.I.V.), Austrian Science Fund (FWF) grant J-3401 (C.F.), and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 602133 (L.F. and S.S.). This work was also supported in part by the Intramural Research Program of the National Institute on Aging, National Institutes of Health (D. Schlessinger).
Author information
Authors and Affiliations
Contributions
S.D., L.F., S.S., G.R.A., and C.F. designed the methods and experiments. C.S., A.E.L., A.K., S.I.V., E.Y.C., S.L., M.M., D. Schlessinger, P.-R.L., D. Stambolian, W.G.I., A.S., L.J.S., F.C., F.K., and M.B. provided data or tools. S.D., G.R.A., and C.F. wrote the first draft. All authors contributed critical reviews of the manuscript during its preparation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Imputation server overview.
The imputation workflow uses two MapReduce jobs to parallelize the quality control and the phasing/imputation step.
Supplementary Figure 3 Parameter estimation study.
The figure compares the imputation accuracy across three parameter estimation methods on six different populations from the Human Genome Diversity Project (HGDP) on chromosomes 20–22.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3, Supplementary Tables 1–4 and Supplementary Note. (PDF 2575 kb)
Rights and permissions
About this article
Cite this article
Das, S., Forer, L., Schönherr, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016). https://doi.org/10.1038/ng.3656
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3656
This article is cited by
-
Genome-wide polygenic risk score for major osteoporotic fractures in postmenopausal women using associated single nucleotide polymorphisms
Journal of Translational Medicine (2023)
-
Comparison of multiple imputation and other methods for the analysis of imputed genotypes
BMC Genomics (2023)
-
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Genetics Selection Evolution (2023)
-
Identification of genetic variants associated with diabetic kidney disease in multiple Korean cohorts via a genome-wide association study mega-analysis
BMC Medicine (2023)
-
Multi-ancestry phenome-wide association of complement component 4 variation with psychiatric and brain phenotypes in youth
Genome Biology (2023)