Article | Published:

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing

Nature Biotechnology volume 31, pages 11111118 (2013) | Download Citation

Abstract

Rapid advances in high-throughput sequencing facilitate variant discovery and genotyping, but linking variants into a single haplotype remains challenging. Here we demonstrate HaploSeq, an approach for assembling chromosome-scale haplotypes by exploiting the existence of 'chromosome territories'. We use proximity ligation and sequencing to show that alleles on homologous chromosomes occupy distinct territories, and therefore this experimental protocol preferentially recovers physically linked DNA variants on a homolog. Computational analysis of such data sets allows for accurate (99.5%) reconstruction of chromosome-spanning haplotypes for 95% of alleles in hybrid mouse cells with 30× sequencing coverage. To resolve haplotypes for a human genome, which has a low density of variants, we coupled HaploSeq with local conditional phasing to obtain haplotypes for 81% of alleles with 98% accuracy from just 17× sequencing. Whereas methods based on proximity ligation were originally designed to investigate spatial organization of genomes, our results lend support for their use as a general tool for haplotyping.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

Referenced accessions

Sequence Read Archive

References

  1. 1.

    et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

  2. 2.

    , & Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

  3. 3.

    et al. Noninvasive whole-genome sequencing of a human fetus. Sci. Transl. Med. 4, 137ra76 (2012).

  4. 4.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

  5. 5.

    & Definition and clinical importance of haplotypes. Annu. Rev. Med. 56, 303–320 (2005).

  6. 6.

    , , , & MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med. 4, e8 (2007).

  7. 7.

    NCI-NHGRI Working Group on Replication in Association Studies. et al. Replicating genotype-phenotype associations. Nature 447, 655–660 (2007).

  8. 8.

    & Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).

  9. 9.

    et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).

  10. 10.

    et al. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nat. Genet. 40, 1062–1064 (2008).

  11. 11.

    International Consortium for Systemic Lupus Erythematosus Genetics. et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. 40, 204–210 (2008).

  12. 12.

    Dominant versus recessive: molecular mechanisms in metabolic disease. J. Inherit. Metab. Dis. 31, 599–618 (2008).

  13. 13.

    International HapMap Consortium. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

  14. 14.

    1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  15. 15.

    1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  16. 16.

    et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

  17. 17.

    , , & Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).

  18. 18.

    et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

  19. 19.

    et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).

  20. 20.

    et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010).

  21. 21.

    , , & Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).

  22. 22.

    & Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

  23. 23.

    et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

  24. 24.

    & HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

  25. 25.

    et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

  26. 26.

    et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).

  27. 27.

    et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

  28. 28.

    et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).

  29. 29.

    , & Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA 108, 12–17 (2011).

  30. 30.

    et al. Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods 7, 299–301 (2010).

  31. 31.

    et al. Sequencing of isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23, 826–832 (2013).

  32. 32.

    , , , & The importance of phase information for human genomics. Nat. Rev. Genet. 12, 215–223 (2011).

  33. 33.

    , , & Capturing chromosome conformation. Science 295, 1306–1311 (2002).

  34. 34.

    et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

  35. 35.

    , , , & Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2012).

  36. 36.

    et al. Pairing of homologous regions in the mouse genome is associated with transcription but not imprinting status. PLoS ONE 7, e38983 (2012).

  37. 37.

    & Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

  38. 38.

    et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013).

  39. 39.

    & Estimating haplotype-disease associations with pooled genotype data. Genet. Epidemiol. 28, 70–82 (2005).

  40. 40.

    , , & Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).

  41. 41.

    et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

  42. 42.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  43. 43.

    et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  44. 44.

    , , , & Asynchronous replication timing of imprinted loci is independent of DNA methylation, but consistent with differential subnuclear localization. Genes Dev. 17, 759–773 (2003).

  45. 45.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  46. 46.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

Download references

Acknowledgements

We thank E. Heard for providing the CAST×J129 hybrid mouse ES cell line for this study. We thank V. Bafna for providing valuable suggestions in the course of the work. We are also grateful for the comments on this manuscript by K. Zhang. Funding for this study is provided by the Ludwig Institute for Cancer Research and the Roadmap Epigenome Project (U01 ES017166).

Author information

Author notes

    • Siddarth Selvaraj
    •  & Jesse R Dixon

    These authors contributed equally to this work.

Affiliations

  1. Ludwig Institute for Cancer Research, La Jolla, California, USA.

    • Siddarth Selvaraj
    • , Jesse R Dixon
    •  & Bing Ren
  2. Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California, USA.

    • Siddarth Selvaraj
  3. Medical Scientist Training Program, University of California, San Diego, La Jolla, California, USA.

    • Jesse R Dixon
  4. Scripps Translational Science Institute and Scripps Health, La Jolla, California, USA.

    • Vikas Bansal
  5. Department of Cellular and Molecular Medicine, University of California, San Diego, School of Medicine, La Jolla, California, USA.

    • Bing Ren
  6. Institute of Genomic Medicine, University of California, San Diego, La Jolla, California, USA.

    • Bing Ren

Authors

  1. Search for Siddarth Selvaraj in:

  2. Search for Jesse R Dixon in:

  3. Search for Vikas Bansal in:

  4. Search for Bing Ren in:

Contributions

B.R., S.S. and J.R.D. conceived the HaploSeq strategy. J.R.D. performed experiments and carried out the initial data processing. S.S. conducted haplotyping data analysis. V.B. and S.S. modified the HapCUT program for HaploSeq. S.S. and J.D. prepared the manuscript.

Competing interests

S.S., J.D. and B.R. are named inventors on a patent application on the technology described in this manuscript.

Corresponding author

Correspondence to Bing Ren.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–9

Zip files

  1. 1.

    Supplementary Data 1

    HapCUT Source Code

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2728

Further reading Further reading