Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Whole-genome haplotyping using long reads and statistical methods

Abstract

The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2–1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: SLRH.
Figure 2: Haplotyping results at several accuracy thresholds.
Figure 3: Haplotyping performance from 30 Gbp of sequencing.
Figure 4: Genome browser view of differentially methylated regions at the promoter of the H19 gene.

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Tewhey, R., Bansal, V., Torkamani, A., Topol, E.J. & Schork, N.J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

    Article  CAS  Google Scholar 

  2. Browning, S.R. & Browning, B.L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

    Article  CAS  Google Scholar 

  3. Roach, J.C. et al. Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89, 382–397 (2011).

    Article  CAS  Google Scholar 

  4. Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).

    Article  CAS  Google Scholar 

  5. Yang, H., Chen, X. & Wong, W.H. Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA 108, 12–17 (2011).

    Article  CAS  Google Scholar 

  6. Selvaraj, S., Dixon, R.J., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31, 1111–1118 (2013).

    Article  CAS  Google Scholar 

  7. Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    Article  CAS  Google Scholar 

  8. Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

    Article  CAS  Google Scholar 

  9. Ruano, G., Kidd, K.K. & Stephens, J.C. Haplotype of multiple polymorphisms resolved by enzymatic amplification of single DNA molecules. Proc. Natl. Acad. Sci. USA 87, 6296–6300 (1990).

    Article  CAS  Google Scholar 

  10. Jeffreys, A.J., Neumann, R. & Wilson, V. Repeat unit sequence variation in minisatellites: a novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell 60, 473–485 (1990).

    Article  CAS  Google Scholar 

  11. Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

    Article  CAS  Google Scholar 

  12. Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).

    Article  CAS  Google Scholar 

  13. Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).

    Article  Google Scholar 

  14. Daelemans, C. et al. High-throughput analysis of candidate imprinted genes and allele-specific gene expression in the human term placenta. BMC Genet. 11, 25 (2010).

    Article  Google Scholar 

  15. Suk, E. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).

    Article  CAS  Google Scholar 

  16. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  17. Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

    CAS  PubMed  Google Scholar 

  18. Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Article  Google Scholar 

  19. Delaneau, O., Zagury, J. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).

    Article  CAS  Google Scholar 

  20. Delaneau, O., Howie, B., Cox, A.J., Zagury, J. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).

    Article  CAS  Google Scholar 

  21. Hsu, F. et al. The UCSC Known Genes. Bioinformatics 22, 1036–1046 (2006).

    Article  CAS  Google Scholar 

  22. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

    Article  CAS  Google Scholar 

  23. Edwards, C.A. & Ferguson-Smith, A.C. Mechanisms regulating imprinted genes in clusters. Curr. Opin. Cell Biol. 19, 281–289 (2007).

    Article  CAS  Google Scholar 

  24. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  Google Scholar 

  25. Gertz, J. et al. Analysis of DNA methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation. PLoS Genet. 7, e1002228 (2011).

    Article  CAS  Google Scholar 

  26. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  Google Scholar 

  27. Li, Y. et al. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol. 8, e1000533 (2010).

    Article  Google Scholar 

  28. Welch, K.O., Marin, R.S., Pandya, A. & Arnos, K.S. Compound heterozygosity for dominant and recessive GJB2 mutations: effect on phenotype and review of the literature. Am. J. Med. Genet. A. 143A, 1567–1573 (2007).

    Article  CAS  Google Scholar 

  29. Fong, C.Y.I., Mumford, A.D., Likeman, M.J. & Jardine, P.E. Cerebral palsy in siblings caused by compound heterozygous mutations in the gene encoding protein C. Dev. Med. Child Neurol. 52, 489–493 (2010).

    Article  Google Scholar 

  30. Shimizu, H. et al. Epidermolysis bullosa simplex associated with muscular dystrophy: phenotype-genotype correlations and review of the literature. J. Am. Acad. Dermatol. 41, 950–956 (1999).

    Article  CAS  Google Scholar 

  31. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  Google Scholar 

  32. Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

We thank C. Pan for assistance in coordinating contacts and discussions. This work is funded by US National Institutes of Health grants HL107393-02, HG004558-05, and the Genetics Department of Stanford University.

Author information

Authors and Affiliations

Authors

Contributions

D.P. and M.K. developed the laboratory preparation protocol. V.K. developed the Prism phasing algorithm. Z.M. and R.C. performed the Methyl-seq experiments. T.B. prepared the phasing libraries. V.K., D.X. and D.P. performed computational analysis. V.K., D.X. and M.S. wrote the manuscript. R.C. and M.K. reviewed and revised the manuscript. M.K. and M.S. supervised the research.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

V.K., D.P., T.B. and M.K. performed the research at Moleculo Inc. (acquired by Illumina Inc.). D.P., T.B. and M.K. are employed by Illumina Inc. and V.K. is a consultant to Illumina Inc. M.K., D.P., T.B. and V.K. are listed as inventors on a patent filed for the SLRH technology. The library preparation protocol is covered by US and international patents with numbers 61/532,882 and 13/608,778 on which D.P. and M.K. are listed as inventors. The SLRH technology is offered commercially by Illumina Inc.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1–15 and Supplementary Note (PDF 1431 kb)

List of DMRs

List of DMRs (XLSX 106 kb)

Prism source code

Prism Source Code (ZIP 181 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuleshov, V., Xie, D., Chen, R. et al. Whole-genome haplotyping using long reads and statistical methods. Nat Biotechnol 32, 261–266 (2014). https://doi.org/10.1038/nbt.2833

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2833

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing