Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

High-throughput genome scaffolding from in vivo DNA interaction frequency

This article has been updated

Abstract

Despite advances in DNA sequencing technology, assembly of complex genomes remains a major challenge, particularly for genomes sequenced using short reads, which yield highly fragmented assemblies1,2,3. Here we show that genome-wide in vivo chromatin interaction frequency data, which are measurable with chromosome conformation capture–based experiments, can be used as genomic distance proxies to accurately position individual contigs without requiring any sequence overlap. We also use these data to construct approximate genome scaffolds de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size and should be applicable to any species for which global chromatin interaction data can be generated.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Interaction frequency accurately predicts chromosome and locus for scaffold augmentation.
Figure 2: Scaffold augmentation of the human genome.
Figure 3: De novo karyotyping (chromosome assignment).
Figure 4: Accurate de novo chromosome scaffolding with interaction frequencies.

Similar content being viewed by others

Change history

  • 04 December 2013

    In the version of this supplementary file originally posted online, a section entitled Supplementary Methods was inadvertently included in the file and the legends for tables and figures omitted. The errors have been corrected in this file as of 4 December 2013.

References

  1. Nagarajan, N. & Pop, M. Sequence assembly demystified. Nat. Rev. Genet. 14, 157–167 (2013).

    Article  CAS  Google Scholar 

  2. Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).

    Article  CAS  Google Scholar 

  3. Birney, E. Assemblies: the good, the bad, the ugly. Nat. Methods 8, 59–60 (2011).

    Article  CAS  Google Scholar 

  4. Baker, M. De novo genome assembly: what every biologist should know. Nat. Methods 9, 333–337 (2012).

    Article  CAS  Google Scholar 

  5. Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).

    Article  CAS  Google Scholar 

  6. van den Engh, G., Sachs, R. & Trask, B.J. Estimating genomic distance from DNA sequence location in cell nuclei by a random walk model. Science 257, 1410–1412 (1992).

    Article  CAS  Google Scholar 

  7. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  Google Scholar 

  8. Cheung, V.G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

    Article  CAS  Google Scholar 

  9. Genovese, G. et al. Using population admixture to help complete maps of the human genome. Nat. Genet. 45, 406–414 (2013).

    Article  CAS  Google Scholar 

  10. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  11. Bradnam, K.R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).

    Article  Google Scholar 

  12. Salzberg, S.L. et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).

    Article  CAS  Google Scholar 

  13. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  Google Scholar 

  14. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).

    Article  CAS  Google Scholar 

  15. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).

    Article  CAS  Google Scholar 

  16. Dixon, J.R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).

    Article  CAS  Google Scholar 

  17. Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).

    Article  CAS  Google Scholar 

  18. Moissiard, G. et al. MORC family ATPases required for heterochromatin condensation and gene silencing. Science 336, 1448–1451 (2012).

    Article  CAS  Google Scholar 

  19. Dekker, J., Marti-Renom, M.A. & Mirny, L.A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).

    Article  CAS  Google Scholar 

  20. Cremer, T. & Cremer, M. Chromosome territories. Cold Spring Harb. Perspect. Biol. 2, a003889 (2010).

    Article  Google Scholar 

  21. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

    Article  CAS  Google Scholar 

  22. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  23. Nora, E.P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).

    Article  CAS  Google Scholar 

  24. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

    Article  CAS  Google Scholar 

  25. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006).

    Article  CAS  Google Scholar 

  26. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).

    Article  CAS  Google Scholar 

  27. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).

    Article  CAS  Google Scholar 

  28. Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

    Article  CAS  Google Scholar 

  29. Pedregosa, F., Weiss, R. & Brucher, M. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  30. Kaplan, N., Friedlich, M., Fromer, M. & Linial, M. A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5, 196 (2004).

    Article  Google Scholar 

  31. Nocedal, J. Updating Quasi-Newton Matrices with Limited Storage. Math. Comput. 35, 773–782 (1980).

    Article  Google Scholar 

Download references

Acknowledgements

We thank B.R. Lajoie for help with processing of the Hi-C data. We thank the members of the Dekker Lab and G. Fudenberg for helpful discussions. This study is supported by the National Human Genome Research Institute (HG003143 to J.D.). N.K. is supported by a Long-Term Fellowship from the Human Frontier Science Program.

Author information

Authors and Affiliations

Authors

Contributions

N.K. and J.D. conceived the strategy for genome assembly. N.K. performed all analyses and developed all computational approaches. N.K. and J.D. wrote the paper.

Corresponding authors

Correspondence to Noam Kaplan or Job Dekker.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Discussion, Supplementary Figure 1 and Supplementary Tables 1–3 (PDF 1383 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaplan, N., Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol 31, 1143–1147 (2013). https://doi.org/10.1038/nbt.2768

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2768

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing