As genome sequence data sets continue to grow, there is a pressing need to develop accurate yet memory-efficient means of assembling genomesde novo. Using new computational tools, the authors assembled a human genome using less than 64 gigabytes of memory. A compression algorithm stores the reads efficiently by taking advantage of redundancy between them; the compressed reads are then error-corrected and assembled by String Graph Assembler, which is a new algorithm that is easily parallelizable.
ORIGINAL RESEARCH PAPER
Simpson, J. T. & Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 7 Dec 2011 (doi:10.1101/gr.126953.111)
Rights and permissions
About this article
Cite this article
Casci, T. Data compression facilitates genome assembly. Nat Rev Genet 13, 73 (2012). https://doi.org/10.1038/nrg3166
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg3166