Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing

Abstract

Rapid advances in high-throughput sequencing facilitate variant discovery and genotyping, but linking variants into a single haplotype remains challenging. Here we demonstrate HaploSeq, an approach for assembling chromosome-scale haplotypes by exploiting the existence of 'chromosome territories'. We use proximity ligation and sequencing to show that alleles on homologous chromosomes occupy distinct territories, and therefore this experimental protocol preferentially recovers physically linked DNA variants on a homolog. Computational analysis of such data sets allows for accurate (99.5%) reconstruction of chromosome-spanning haplotypes for 95% of alleles in hybrid mouse cells with 30× sequencing coverage. To resolve haplotypes for a human genome, which has a low density of variants, we coupled HaploSeq with local conditional phasing to obtain haplotypes for 81% of alleles with 98% accuracy from just 17× sequencing. Whereas methods based on proximity ligation were originally designed to investigate spatial organization of genomes, our results lend support for their use as a general tool for haplotyping.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: HaploSeq method for reconstructing haplotypes.
Figure 2: Proximity-ligation products are predominantly intrahaplotype.
Figure 3: HaploSeq allows for accurate, high-resolution, and chromosome-spanning reconstruction of haplotypes.
Figure 4: Haplotype reconstruction in human GM12878 cells using HaploSeq.
Figure 5: HaploSeq analysis coupled with local conditional phasing permits high-resolution haplotype reconstruction in humans.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Sequence Read Archive

References

  1. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Kitzman, J.O. et al. Noninvasive whole-genome sequencing of a human fetus. Sci. Transl. Med. 4, 137ra76 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Crawford, D.C. & Nickerson, D.A. Definition and clinical importance of haplotypes. Annu. Rev. Med. 56, 303–320 (2005).

    Article  CAS  PubMed  Google Scholar 

  6. Petersdorf, E.W., Malkki, M., Gooley, T.A., Martin, P.J. & Guo, Z. MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med. 4, e8 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. NCI-NHGRI Working Group on Replication in Association Studies. et al. Replicating genotype-phenotype associations. Nature 447, 655–660 (2007).

  8. Cirulli, E.T. & Goldstein, D.B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).

    Article  CAS  PubMed  Google Scholar 

  9. Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).

    Article  CAS  PubMed  Google Scholar 

  10. Musone, S.L. et al. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nat. Genet. 40, 1062–1064 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. International Consortium for Systemic Lupus Erythematosus Genetics. et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. 40, 204–210 (2008).

  12. Zschocke, J. Dominant versus recessive: molecular mechanisms in metabolic disease. J. Inherit. Metab. Dis. 31, 599–618 (2008).

    Article  CAS  PubMed  Google Scholar 

  13. International HapMap Consortium. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

  14. 1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  15. 1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  16. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. & Chess, A. Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).

    Article  CAS  PubMed  Google Scholar 

  18. Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. McDaniell, R. et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).

    Article  CAS  PubMed  Google Scholar 

  22. Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

    Article  PubMed  Google Scholar 

  25. Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    Article  CAS  PubMed  Google Scholar 

  26. Suk, E.K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

    Article  CAS  PubMed  Google Scholar 

  28. Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Yang, H., Chen, X. & Wong, W.H. Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA 108, 12–17 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods 7, 299–301 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kirkness, E.F. et al. Sequencing of isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23, 826–832 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tewhey, R., Bansal, V., Torkamani, A., Topol, E.J. & Schork, N.J. The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    Article  CAS  PubMed  Google Scholar 

  34. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2012).

    Article  CAS  Google Scholar 

  36. Krueger, C. et al. Pairing of homologous regions in the mouse genome is associated with transcription but not imprinting status. PLoS ONE 7, e38983 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Browning, B.L. & Browning, S.R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  38. He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zeng, D. & Lin, D.Y. Estimating haplotype-disease associations with pooled genotype data. Genet. Epidemiol. 28, 70–82 (2005).

    Article  CAS  PubMed  Google Scholar 

  40. Chapman, J.M., Cooper, J.D., Todd, J.A. & Clayton, D.G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).

    Article  PubMed  Google Scholar 

  41. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

    Article  CAS  PubMed  Google Scholar 

  42. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Gribnau, J., Hochedlinger, K., Hata, K., Li, E. & Jaenisch, R. Asynchronous replication timing of imprinted loci is independent of DNA methylation, but consistent with differential subnuclear localization. Genes Dev. 17, 759–773 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank E. Heard for providing the CAST×J129 hybrid mouse ES cell line for this study. We thank V. Bafna for providing valuable suggestions in the course of the work. We are also grateful for the comments on this manuscript by K. Zhang. Funding for this study is provided by the Ludwig Institute for Cancer Research and the Roadmap Epigenome Project (U01 ES017166).

Author information

Authors and Affiliations

Authors

Contributions

B.R., S.S. and J.R.D. conceived the HaploSeq strategy. J.R.D. performed experiments and carried out the initial data processing. S.S. conducted haplotyping data analysis. V.B. and S.S. modified the HapCUT program for HaploSeq. S.S. and J.D. prepared the manuscript.

Corresponding author

Correspondence to Bing Ren.

Ethics declarations

Competing interests

S.S., J.D. and B.R. are named inventors on a patent application on the technology described in this manuscript.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 (PDF 2848 kb)

Supplementary Data 1

HapCUT Source Code (ZIP 52 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Selvaraj, S., R Dixon, J., Bansal, V. et al. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat Biotechnol 31, 1111–1118 (2013). https://doi.org/10.1038/nbt.2728

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2728

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing