Letter | Published:

High-throughput genome scaffolding from in vivo DNA interaction frequency

Nature Biotechnology volume 31, pages 11431147 (2013) | Download Citation

This article has been updated

Abstract

Despite advances in DNA sequencing technology, assembly of complex genomes remains a major challenge, particularly for genomes sequenced using short reads, which yield highly fragmented assemblies1,2,3. Here we show that genome-wide in vivo chromatin interaction frequency data, which are measurable with chromosome conformation capture–based experiments, can be used as genomic distance proxies to accurately position individual contigs without requiring any sequence overlap. We also use these data to construct approximate genome scaffolds de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size and should be applicable to any species for which global chromatin interaction data can be generated.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 04 December 2013

    In the version of this supplementary file originally posted online, a section entitled Supplementary Methods was inadvertently included in the file and the legends for tables and figures omitted. The errors have been corrected in this file as of 4 December 2013.

References

  1. 1.

    & Sequence assembly demystified. Nat. Rev. Genet. 14, 157–167 (2013).

  2. 2.

    , & Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).

  3. 3.

    Assemblies: the good, the bad, the ugly. Nat. Methods 8, 59–60 (2011).

  4. 4.

    De novo genome assembly: what every biologist should know. Nat. Methods 9, 333–337 (2012).

  5. 5.

    , & Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

  6. 6.

    , & Estimating genomic distance from DNA sequence location in cell nuclei by a random walk model. Science 257, 1410–1412 (1992).

  7. 7.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

  8. 8.

    et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

  9. 9.

    et al. Using population admixture to help complete maps of the human genome. Nat. Genet. 45, 406–414 (2013).

  10. 10.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  11. 11.

    et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).

  12. 12.

    et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).

  13. 13.

    et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

  14. 14.

    et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).

  15. 15.

    et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).

  16. 16.

    et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

  17. 17.

    et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).

  18. 18.

    et al. MORC family ATPases required for heterochromatin condensation and gene silencing. Science 336, 1448–1451 (2012).

  19. 19.

    , & Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).

  20. 20.

    & Chromosome territories. Cold Spring Harb. Perspect. Biol. 2, a003889 (2010).

  21. 21.

    , , & The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

  22. 22.

    International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  23. 23.

    et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).

  24. 24.

    et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

  25. 25.

    et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006).

  26. 26.

    et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).

  27. 27.

    et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).

  28. 28.

    et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

  29. 29.

    , & Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

  30. 30.

    , , & A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5, 196 (2004).

  31. 31.

    Updating Quasi-Newton Matrices with Limited Storage. Math. Comput. 35, 773–782 (1980).

Download references

Acknowledgements

We thank B.R. Lajoie for help with processing of the Hi-C data. We thank the members of the Dekker Lab and G. Fudenberg for helpful discussions. This study is supported by the National Human Genome Research Institute (HG003143 to J.D.). N.K. is supported by a Long-Term Fellowship from the Human Frontier Science Program.

Author information

Affiliations

  1. Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA.

    • Noam Kaplan
    •  & Job Dekker

Authors

  1. Search for Noam Kaplan in:

  2. Search for Job Dekker in:

Contributions

N.K. and J.D. conceived the strategy for genome assembly. N.K. performed all analyses and developed all computational approaches. N.K. and J.D. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Noam Kaplan or Job Dekker.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Discussion, Supplementary Figure 1 and Supplementary Tables 1–3

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2768

Further reading