Next-generation genotype imputation service and methods


Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity.

Figure 1: Overview of state space reduction.


The authors gratefully acknowledge D. Hinds for assistance with minimac3 code optimizations and A.L. Williams for providing HAPI-UR. We acknowledge support from National Institutes of Health grants HG007022 and HL117626 (G.R.A.), HG000376 (M.B.), and R01DA037904 (S.I.V.), Austrian Science Fund (FWF) grant J-3401 (C.F.), and the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 602133 (L.F. and S.S.). This work was also supported in part by the Intramural Research Program of the National Institute on Aging, National Institutes of Health (D. Schlessinger).

S.D., L.F., S.S., G.R.A., and C.F. designed the methods and experiments. C.S., A.E.L., A.K., S.I.V., E.Y.C., S.L., M.M., D. Schlessinger, P.-R.L., D. Stambolian, W.G.I., A.S., L.J.S., F.C., F.K., and M.B. provided data or tools. S.D., G.R.A., and C.F. wrote the first draft. All authors contributed critical reviews of the manuscript during its preparation.

Correspondence to Gonçalo R Abecasis or Christian Fuchsberger.

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Imputation server overview.

The imputation workflow uses two MapReduce jobs to parallelize the quality control and the phasing/imputation step.

Supplementary Figure 2 Quality control workflow for each variant site.

Supplementary Figure 3 Parameter estimation study.

The figure compares the imputation accuracy across three parameter estimation methods on six different populations from the Human Genome Diversity Project (HGDP) on chromosomes 20–22.

Supplementary Figures 1–3, Supplementary Tables 1–4 and Supplementary Note. (PDF 2575 kb)

Das, S., Forer, L., Schönherr, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016).

