Box 1. Box 1 What makes a completely sequenced genome?
From the following article:
The draft sequences: Filling in the gaps
Peer Bork & Richard Copley
Nature 409, 818-820(15 February 2001)
doi:10.1038/35057274
When is sequencing work on a genome complete? No genome for a eukaryotic organism — roughly, those organisms whose cells contain a nucleus — has been sequenced to 100%. There are regions, often highly repetitive, that are difficult or impossible to clone (one of the initial steps in a sequencing project) or sequence with current technology. Fortunately, such regions are expected to contain relatively few protein-coding genes4, 10.
The extent of these regions varies widely in different species. So, rather than applying a universal gold standard, each sequencing project has made pragmatic decisions as to what constitutes a sufficient level of coverage for a particular genome. For example, as much as one-third of the sequence of the fruitfly Drosophila melanogaster was not stable in the cloning systems used, and so was not sequenced. But 97% of the so-called euchromatic portion — where most genes are thought to reside — was sequenced 11 (Fig. 1).
For the human genome, one definition of 'finished' is that fewer than one base in 10,000 is incorrectly assigned6; more than 95% of the euchromatic regions are sequenced; and each gap is smaller than 150 kilobases 12. Such standards represent realistic goals given current technology. By this standard, over a quarter of the public consortium's sequence1 is considered finished at present, including the previously published long arms of chromosomes 21 and 22 (refs 3, 4; Fig. 1). The Celera sequences of chromosomes 21 and 22 are slightly more gappy than those from the public consortium, but the converse seems to be true for the other chromosomes2. But again, as different protocols were used, it is not easy to compare the overall status of the two assemblies. In the longer term, as much of the heterochromatin — which is harder to sequence, and contains few genes — as possible must be sequenced, because we might otherwise miss important features. P.B. & R.C.
