How times change. Within the space of less than a decade, the development of high-throughput technologies has transformed the task of sequencing a mammalian genome from the years-long, multimillion-dollar endeavour it was originally, to a project that can be performed by an individual laboratory within a few months. Such is the breathtaking extent of progress that it has now been possible to sequence the nuclear genome of the extinct woolly mammoth (Fig. 1) almost to completion.

Figure 1: Ancient blueprint.
figure 1

ANCIENT ART & ARCHITECTURE COLLECTION LTD

This wall painting of a mammoth and ibex is in Rouffignac Cave, Dordogne, France, and is dated to around 13,000 years ago.

This achievement is described by Miller et al. on page 387 of this issue1. Stretches of mammoth DNA from a cellular organelle, the mitochondrion, which has its own small genome, have been sequenced previously. But tackling the 4 billion to 5 billion base pairs in the much larger and biologically informative mammoth cell nucleus posed a different order of challenge.

Ancient DNA — DNA obtained from fossils up to about 100,000 years old — is highly fragmented, is present in only trace amounts, and is usually swamped by bacterial and fungal DNA. So the idea of sequencing the complete genome of an extinct species was long unthinkable, given that for 30 years the only large-scale method available, the Sanger sequencing method2, was not adequate for such a task. However, in 2005 a new way of sequencing DNA was published3. Known as the 454 method, it initially had a throughput of 20 million base pairs per run — an increase of two to three orders of magnitude compared with Sanger sequencing. Since then, this method's throughput has increased to about 100 million base pairs and three other types of sequencing approaches have been launched — Solexa, SOLiD and HeliScope. These methods produce as much as 10 billion base pairs of sequence data in a single run. But all of them are 'shotgun' methods, which produce rather short sequencing reads; some also have quite high error rates. So, to obtain a reliable sequence for each nucleotide position, a genome needs to be sequenced to multiple coverage, which Miller and colleagues estimate would be around 10–20-fold for the mammoth. Given the rapid increase in sequencing throughput, it is possible that even this will soon be achieved.

These developments have both changed the way we think about molecular genetics and shown how we might finally get a handle on the genomes of extinct species. Not least, some of the drawbacks of the new technologies for studying modern DNA are advantages when it comes to ancient DNA. Thus, whereas the Sanger method allows individual stretches of up to 800 base pairs to be determined in a single reaction, the new techniques yield much shorter sequences, sometimes as few as 30 base pairs4. This is a disadvantage when dealing with modern DNA. But it does not matter with ancient DNA, which is mostly fragmented into pieces shorter than 100 base pairs.

Researchers were quick to exploit the new possibilities. Only months after the 454 technology became available, it was applied to mammoth genomics in a paper5 that reported 13 million base pairs of sequence — about 1,000 times more than were covered in the first ancient-genomics study with Sanger sequencing6. In that paper5, published in January 2006, the authors also announced their plan to sequence the mammoth genome to completion. Miller and colleagues1 now describe about 70% of the mammoth genome, and so go a long way to achieving that goal.

Miller et al. were aided immensely in their task by the fact that, unusually for extinct organisms, some specimens of woolly mammoths have been frozen in permafrost. This is an ideal setting for preserving DNA, and, moreover, for preserving hair, which is an ideal source of DNA for sequencing ancient genomes. If hair still contains DNA, almost all of it will belong to the extinct species, and will not be of bacterial or fungal origin, as is often the case with bones. Thus, the authors needed to sequence a total of 'only' 4.1 billion base pairs to obtain about 3.3 billion base pairs of mammoth DNA. They calculate that the total mammoth genome, estimated at some 4.7 billion base pairs, would have been 1.4 times bigger than the human genome.

Although the mammoth genome is larger than the human genome, the DNA substitution rate seems to be smaller — this is the rate at which one nucleotide replaces another, and so is a measure of evolutionary change. The mammoth genome differs from that of its close relative the African elephant by as little as 0.6%. This is about half the difference between human and chimpanzee, although the two elephant species diverged at about the same time as human and chimpanzee, and probably even slightly earlier (see Fig. 3 of the paper1 on page 389). For some reason, the substitution rate in the nuclear genome of elephants is much lower than in humans and great apes, a result mirrored in the mitochondrial genomes7, where humans and great apes also show a substitution rate more than twice as high as that in the elephant species. As nuclear and mitochondrial genomes are replicated by different enzymes, it remains unclear why both genomes evolve more slowly in elephants than in humans and great apes.

The draft mammoth genome sequence is too fragmented and error-prone to allow standard gene prediction. Nonetheless, Miller and colleagues identified several protein-coding positions that are unique to the mammoth compared with 50 other vertebrate species. The presence of such mammoth-specific differences is not surprising: it is to be expected that each mammalian species contains unique amino-acid substitutions compared with a limited number of other species. For example, the 52-amino-acid fragment of the ATP2C1 protein not only contains an amino-acid substitution unique to the mammoth, but a further two that are unique to the tenrec and the two-toed sloth, respectively. Similarly, the position in the 30-amino-acid fragment of the protein C1orf190, at which the mammoth differs from other placental mammals, also has amino-acid substitutions in the ground squirrel and kangaroo rat. Although Miller and colleagues argue that the amino-acid differences they identify have a “significantly enhanced likelihood of causing ... phenotypic effects”, their analyses by no means prove that an amino-acid substitution has functional consequences or adaptive value. Such questions can be answered only by investigations of the proteins in question.

So what do we learn from the mammoth genome, except that sequencing of complete genomes from extinct species is indeed possible and that there are differences in their DNA sequences compared with those of living animals? As with many draft genome projects, not that much. But a draft genome is only the beginning of the story. The main feature of genome projects is to provide a resource for further research, as vividly shown by the thousands of times the initial human-genome sequencing papers8,9 have been cited.

The next draft nuclear genome of an extinct species likely to become available is that of our closest relative, the Neanderthal, following on from publication of a complete Neanderthal mitochondrial genome sequence10. For some time yet, much work in genomics will consist of fully annotating and completing genome sequences, as indeed most published sequences of extant vertebrates, let alone that of the extinct mammoth, remain drafts. But when we look further into the future, the task will be to understand which differences at the sequence level underlie the phenotypic differences between a mammoth and an elephant, or a human and a Neanderthal, for which well-annotated genomes provide the essential basis.