When assembling a large quantity of reads in a genomic shotgun project a serious limitation is the amount of random access memory (RAM) of the computers used in the project. This arises because all assembling programs must look at all the overlaps between reads at the same time, using RAM in order to construct contigs, and the memory of the computer can be filled up during this step, causing the abortion of the assembling process.Here we propose an algorithm that is capable of overcoming any memory limitation by using redundancy of processing and thus producing an increase in computing time but overcoming the memory limitation.The proposed algorithm consists in dividing the reads in a set of groups which size is half the maximum capability in memory of the computer used and performing assembling for all the possible combination pairs of such groups. After eliminating the redundancy of the set of contigs obtained in the previous step, the process is iterated until a set of contigs of manageable size is obtained such that the set can be handled by the assembler in a final step.Each step of the procedure increases the time of computing from k to approximately k + k(k-1)/2, but in many practical cases only one step is needed to finish the assembling process. The procedure is suitable for any kind of assembler and was successfully applied to the assembly of a very large set of reads from the maize genome.
About this article
Cite this article
Martinez, O., Fernandez-Cortes, A. MUEGANO: A divide and conquer algorithm to overcome memory limitations when assembling shotgun projects. Nat Prec (2009). https://doi.org/10.1038/npre.2009.3712.1
- shotgun assembly maize genome algorithm