Abstract
Inexpensive genotyping methods are essential for genetic studies requiring large sample sizes. In human studies, array-based microarrays and high-density haplotype reference panels allow efficient genotype imputation for this purpose. However, these resources are typically unavailable in non-human settings. Here we describe a method (STITCH) for imputation based only on sequencing read data, without requiring additional reference panels or array data. We demonstrate its applicability even in settings of extremely low sequencing coverage, by accurately imputing 5.7 million SNPs at a mean r2 value of 0.98 in 2,073 outbred laboratory mice (0.15× sequencing coverage). In a sample of 11,670 Han Chinese (1.7× coverage), we achieve accuracy similar to that of alternative approaches that require a reference panel, demonstrating that our approach can work for genetically diverse populations. Our method enables straightforward progression from low-coverage sequence to imputed genotypes, overcoming barriers that at present restrict the application of genome-wide association study technology outside humans.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Ultra-low-coverage genome-wide association study—insights into gestational age using 17,844 embryo samples with preimplantation genetic testing
Genome Medicine Open Access 14 February 2023
-
Epigenomic charting and functional annotation of risk loci in renal cell carcinoma
Nature Communications Open Access 21 January 2023
-
Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing
Genetics Selection Evolution Open Access 18 November 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).
Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Swarts, K. et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome http://dx.doi.org/10.3835/plantgenome2014.05.0023 (2014).
Huang, B.E. & George, A.W. R/mpMap: a computational platform for the genetic analysis of multiparent recombinant inbred lines. Bioinformatics 27, 727–729 (2011).
Sargolzaei, M., Chesnais, J.P. & Schenkel, F.S. A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 478 (2014).
VanRaden, P.M., Sun, C. & O'Connell, J.R. Fast imputation using medium or low-coverage sequence data. BMC Genet. 16, 82 (2015).
Didion, J.P. et al. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias. BMC Genomics 13, 34 (2012).
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44, 631–635 (2012).
CONVERGE Consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
Nicod, J. et al. Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nat. Genet. http://dx.doi.org/10.1038/ng.3595 (2016).
Yalcin, B. et al. Commercially available outbred mice for genome-wide association studies. PLoS Genet. 6, e1001085 (2010).
Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Freedman, A.H. et al. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet. 10, e1004016 (2014).
Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).
Daetwyler, H.D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 46, 858–865 (2014).
VanBuren, R. et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 527, 508–511 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Acknowledgements
R.W.D. is supported by a grant from the Wellcome Trust (097308/Z/11/Z). S.M. is supported by Investigator Award 098387/Z/12/Z. This work was funded by the Wellcome Trust (WT090532/Z/09/Z, WT083573/Z/07/Z, WT089269/Z/09/Z and WT098387/Z/12/Z).
Author information
Authors and Affiliations
Contributions
R.W.D., S.M., and R.M. developed the method. R.W.D. wrote the algorithm and performed analyses. J.F. and R.M. conceived and managed the CFW and CONVERGE projects. All authors contributed to study design, drafted the paper, and reviewed and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Tables 1–8 and Supplementary Note. (PDF 1385 kb)
Rights and permissions
About this article
Cite this article
Davies, R., Flint, J., Myers, S. et al. Rapid genotype imputation from sequence without reference panels. Nat Genet 48, 965–969 (2016). https://doi.org/10.1038/ng.3594
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3594
This article is cited by
-
Ultra-low-coverage genome-wide association study—insights into gestational age using 17,844 embryo samples with preimplantation genetic testing
Genome Medicine (2023)
-
Epigenomic charting and functional annotation of risk loci in renal cell carcinoma
Nature Communications (2023)
-
Variation in targetable genomic alterations in non-small cell lung cancer by genetic ancestry, sex, smoking history, and histology
Genome Medicine (2022)
-
Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing
Genetics Selection Evolution (2022)
-
Chromosome-level genome and population genomics reveal evolutionary characteristics and conservation status of Chinese indigenous geese
Communications Biology (2022)