Abstract
Inexpensive genotyping methods are essential to modern genomics. Here we present QUILT, which performs diploid genotype imputation using low-coverage whole-genome sequence data. QUILT employs Gibbs sampling to partition reads into maternal and paternal sets, facilitating rapid haploid imputation using large reference panels. We show this partitioning to be accurate over many megabases, enabling highly accurate imputation close to theoretical limits and outperforming existing methods. Moreover, QUILT can impute accurately using diverse technologies, including long reads from Oxford Nanopore Technologies, and a new form of low-cost barcoded Illumina sequencing called haplotagging, with the latter showing improved accuracy at low coverages. Relative to DNA genotyping microarrays, QUILT offers improved accuracy at reduced cost, particularly for diverse populations that are traditionally underserved in modern genomic analyses, with accuracy nearly doubling at rare SNPs. Finally, QUILT can accurately impute (four-digit) human leukocyte antigen types, the first such method from low-coverage sequence data.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Genetics Selection Evolution Open Access 11 May 2023
-
Pangenomic genotyping with the marker array
Algorithms for Molecular Biology Open Access 05 May 2023
-
In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants
Genetics Selection Evolution Open Access 31 January 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout







Data availability
The HRC haplotypes are available at the European Genome-phenome Archive under accession no. EGAD00001002729; they are available through the Sanger Institute under controlled access. The high-coverage, whole-genome sequence from the 1000 Genomes NYGC collection is available at https://www.internationalgenome.org/data-portal/data-collection/30x-grch38. Specifically, we used file http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/1000G_2504_high_coverage.sequence.index. High-coverage ONT data from Bowden et al.25 are available through the ENA under accession no. PRJEB30620. High-coverage ONT and Illumina (10×) samples from Shafin et al.27 are available through https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=. gnomAD SNP frequencies from the version 3.0 release were downloaded as detailed at https://gnomad.broadinstitute.org/downloads from URLs such as https://storage.googleapis.com/gnomad-public/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.chr1.vcf.bgz. IPD-IMGT/HLA data were downloaded through their github database (https://github.com/ANHIG/IMGTHLA), specifically v.3.39 through https://github.com/ANHIG/IMGTHLA/blob/032815608e6312b595b4aaf9904d5b4c189dd6dc/Alignments_Rel_3390.zip?raw=true. Previously inferred HLA types for 1000 Genomes Project participants (v.20181129) were downloaded from the 1000 Genomes FTP (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HLA_types/20181129_HLA_types_full_1000_Genomes_Project_panel.txt). Recombination rates for the CEU 1000 Genomes Project samples were downloaded from ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/working/20130507_omni_recombination_rates/CEU_omni_recombination_20130507.tar. All new high- and low-coverage sequencing done for this study are available at the Sequence Read Archive under BioProject accession no. PRJNA669554.
Code availability
QUILT is available from https://github.com/rwdavies/QUILT under a General Public License. The specific versions of QUILT used in this manuscript are available from Figshare40.
References
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Burton, P. R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
O’Connell, J. et al. Haplotype estimation for biobank-scale data sets. Nat. Genet. 48, 817–820 (2016).
Loh, P.-R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).
Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat. Genet. 44, 631–635 (2012).
Cai, N. et al. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).
Nicod, J. et al. Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nat. Genet. 48, 912–918 (2016).
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
Meier, J. I. et al. Haplotype tagging reveals parallel formation of hybrid races in two butterfly species. Proc. Natl. Acad. Sci. USA https://doi.org/10.1073/pnas.2015005118 (2021).
Davies, R. W., Flint, J., Myers, S. & Mott, R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965–969 (2016).
Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Spiliopoulou, A., Colombo, M., Orchard, P., Agakov, F. & McKeigue, P. GeneImp: fast imputation to large reference panels using genotype likelihoods from ultralow coverage sequencing. Genetics 206, 91–104 (2017).
Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J. & Delaneau, O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat. Genet. 53, 120–126 (2021).
VanRaden, P. M., Sun, C. & O’Connell, J. R. Fast imputation using medium or low-coverage sequence data. BMC Genet. 16, 82 (2015).
Ros-Freixedes, R. et al. Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations. Genet. Sel. Evol. 52, 17 (2020).
Zheng, C., Boer, M. P. & van Eeuwijk, F. A. Accurate genotype imputation in multiparental populations from low-coverage sequence. Genetics 210, 71–82 (2018).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Bowden, R. et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 10, 1869 (2019).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Shafin, K. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).
Karnes, J. H. et al. Comparison of HLA allelic imputation programs. PLoS ONE 12, e0172444 (2017).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Robinson, J. et al. IPD-IMGT/HLA Database. Nucleic Acids Res. 48, D948–D955 (2020).
Luo, Y. et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ethnic fine-mapping in HIV host response. Preprint at medRxiv https://doi.org/10.1101/2020.07.16.20155606 (2020).
Durvasula, A. & Lohmueller, K. E. Negative selection on complex traits limits phenotype prediction accuracy between populations. Am. J. Hum. Genet. 108, 620–631 (2021).
Wainschtein, P. et al. Recovery of trait heritability from whole genome sequence data. Preprint at bioRxiv https://doi.org/10.1101/588020 (2019).
Snyder, M. W. et al. Copy-number variation and false positive prenatal aneuploidy screening results. N. Engl. J. Med. 372, 1639–1645 (2015).
Liu, S. et al. Genomic analyses from non-invasive prenatal testing reveal genetic associations, patterns of viral infections, and Chinese population history. Cell 175, 347–359.e14 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Davies, R. QUILT source code from manuscript. figshare https://doi.org/10.6084/m9.figshare.14401904.v1 (2021).
Abi-Rached, L. et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PLoS ONE 13, e0206512 (2018).
Acknowledgements
We thank C. Lanz, R. Schwab, O. Weichenrieder and I. Bezrukov at the MPI Developmental Biology for assistance with high-throughput sequencing and associated data processing and A. Noll and the MPI Tübingen IT team for computational support. We used high-coverage resequencing of 1000 Genomes Project data performed by the NYGC. These data were generated at the NYGC with funds provided by National Human Genome Research Institute grant no. 3UM1HG008901-03S1. The research was supported by the Wellcome Trust Core Award Grant no. 203141/Z/16/Z with additional support from the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre and by Wellcome Trust grant nos. 200186/Z/15/Z and 212284/Z/18/Z (to S.M.). The views expressed are those of the author(s) and not necessarily those of the NHS, NIHR or the Department of Health. We acknowledge the contribution and support from affected persons and their families who contributed to the Bloom Syndrome Repository. We thank the New York Community Trust and Weill Cornell Medicine’s Clinical and Translational Science Center for providing funding. M.K., D.S. and Y.F.C. are supported by the Max Planck Society and a European Research Council Starting Grant (no. 639096 HybridMiX).
Author information
Authors and Affiliations
Contributions
R.W.D. developed and implemented QUILT. M.K., D.S. and Y.F.C. developed haplotagging. R.W.D., M.K. and S.S. performed the analyses. M.F. and C.M.C. developed the 5-Family dataset. S.M. developed and implemented the QUILT-HLA typer. R.W.D., Y.F.C. and S.M. wrote the paper. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
M.K. and Y.F.C. declare competing interests in the form of patent and employment by the Max Planck Society. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Sayantan Das and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Note, Tables 1–10 and Figs. 1–6
Rights and permissions
About this article
Cite this article
Davies, R.W., Kucka, M., Su, D. et al. Rapid genotype imputation from sequence with reference panels. Nat Genet 53, 1104–1111 (2021). https://doi.org/10.1038/s41588-021-00877-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00877-0
This article is cited by
-
In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants
Genetics Selection Evolution (2023)
-
Pangenomic genotyping with the marker array
Algorithms for Molecular Biology (2023)
-
The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle
Genetics Selection Evolution (2023)
-
Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing
Genetics Selection Evolution (2022)
-
Accurate genome-wide genotyping from archival tissue to explore the contribution of common genetic variants to pre-cancer outcomes
Journal of Translational Medicine (2022)