Abstract
Despite advances in DNA sequencing technology, assembly of complex genomes remains a major challenge, particularly for genomes sequenced using short reads, which yield highly fragmented assemblies1,2,3. Here we show that genome-wide in vivo chromatin interaction frequency data, which are measurable with chromosome conformation capture–based experiments, can be used as genomic distance proxies to accurately position individual contigs without requiring any sequence overlap. We also use these data to construct approximate genome scaffolds de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size and should be applicable to any species for which global chromatin interaction data can be generated.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Change history
04 December 2013
In the version of this supplementary file originally posted online, a section entitled Supplementary Methods was inadvertently included in the file and the legends for tables and figures omitted. The errors have been corrected in this file as of 4 December 2013.
References
Nagarajan, N. & Pop, M. Sequence assembly demystified. Nat. Rev. Genet. 14, 157–167 (2013).
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Birney, E. Assemblies: the good, the bad, the ugly. Nat. Methods 8, 59–60 (2011).
Baker, M. De novo genome assembly: what every biologist should know. Nat. Methods 9, 333–337 (2012).
Schatz, M.C., Delcher, A.L. & Salzberg, S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 20, 1165–1173 (2010).
van den Engh, G., Sachs, R. & Trask, B.J. Estimating genomic distance from DNA sequence location in cell nuclei by a random walk model. Science 257, 1410–1412 (1992).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Cheung, V.G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).
Genovese, G. et al. Using population admixture to help complete maps of the human genome. Nat. Genet. 45, 406–414 (2013).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Bradnam, K.R. et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).
Salzberg, S.L. et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).
Dixon, J.R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Zhang, Y. et al. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908–921 (2012).
Moissiard, G. et al. MORC family ATPases required for heterochromatin condensation and gene silencing. Science 336, 1448–1451 (2012).
Dekker, J., Marti-Renom, M.A. & Mirny, L.A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat. Rev. Genet. 14, 390–403 (2013).
Cremer, T. & Cremer, M. Chromosome territories. Cold Spring Harb. Perspect. Biol. 2, a003889 (2010).
Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Nora, E.P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).
Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006).
Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006).
Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Pedregosa, F., Weiss, R. & Brucher, M. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Kaplan, N., Friedlich, M., Fromer, M. & Linial, M. A functional hierarchical organization of the protein sequence space. BMC Bioinformatics 5, 196 (2004).
Nocedal, J. Updating Quasi-Newton Matrices with Limited Storage. Math. Comput. 35, 773–782 (1980).
Acknowledgements
We thank B.R. Lajoie for help with processing of the Hi-C data. We thank the members of the Dekker Lab and G. Fudenberg for helpful discussions. This study is supported by the National Human Genome Research Institute (HG003143 to J.D.). N.K. is supported by a Long-Term Fellowship from the Human Frontier Science Program.
Author information
Authors and Affiliations
Contributions
N.K. and J.D. conceived the strategy for genome assembly. N.K. performed all analyses and developed all computational approaches. N.K. and J.D. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Discussion, Supplementary Figure 1 and Supplementary Tables 1–3 (PDF 1383 kb)
Rights and permissions
About this article
Cite this article
Kaplan, N., Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol 31, 1143–1147 (2013). https://doi.org/10.1038/nbt.2768
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.2768
This article is cited by
-
Orchestrating chromosome conformation capture analysis with Bioconductor
Nature Communications (2024)
-
A chromosome-scale assembly of the quinoa genome provides insights into the structure and dynamics of its subgenomes
Communications Biology (2023)
-
A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes
Plant Methods (2022)
-
Genome evolution and diversity of wild and cultivated potatoes
Nature (2022)
-
Integration of Hi-C with short and long-read genome sequencing reveals the structure of germline rearranged genomes
Nature Communications (2022)