You might remember this problem from your childhood: when you lose the top to your puzzle box, you are confronted with lots of pieces and no idea what they are supposed to look like when assembled. Genome sequencers faced the same dilemma when beginning large-scale DNA sequencing. They did the same thing that you might: they started at known landmarks and systematically built up the larger picture.

In order to assemble short stretches of DNA sequence from each read into a larger whole, particularly on a large scale, bioinformaticists developed algorithms that could take input directly from fluorescent sequencing machines. The earliest programs to achieve wide use were called Phred, Phrap and Consed, developed by Phil Green and colleagues. Phred initially went through the sequence reads and assigned a 'base call' to the chromatogram output from the machine. Phrap then assembled the list of bases from multiple reads into the most likely single path through the sequence. Users then viewed and edited the output with Consed, to generate higher-quality sequences as required. These programs were developed for, and used on, the public Human Genome Project.

Gene Myers and colleagues later developed an algorithm that used the end-pair information from sequencing subclones and could assemble larger sequences. They postulated that the whole genome could be cut into pieces, sequenced randomly and reconstructed given sufficient computational power. They demonstrated this approach on the genome of Drosophila melanogaster and famously went on to 'race' the publicly-funded Human Genome Project using, in the end, a combination of their whole-genome assembly methods and data from the public project. However, so-called shotgun whole-genome assemblies are now the method of choice for large genome projects, and the field has moved on to next-generation programs like Arachne, Atlas and PCAP, each using different algorithms.