There are good reasons to continue the endeavour to accumulate genome sequence data from the passengers of Noah's Ark. As illustrated on page 382 of this issue1, genome sequences can serve to address basic evolutionary issues — the power of this approach depending to a large extent on the amount and quality of data available.

The rationale for sequencing the genome of the chimpanzee (Pan troglodytes; Fig. 1, overleaf) has been explained on numerous occasions (see ref. 2 for a review), and a publicly funded effort, involving some of the large US sequencing centres, has already produced a draft assembly of the whole sequence3. But this initial assembly still contains many gaps and ambiguities that present difficulties for some types of analysis.

Figure 1
figure 1


What makes us different, genetically speaking, from a chimp? The sequencing of chimp chromosome 22 is a step on the way to an answer.

In an independent effort1, a consortium of Old World humans has now sequenced chimpanzee chromosome 22 to a degree of completion and accuracy equivalent to that of the human genome assembly in its present version. The quality of this chimp chromosome sequence is therefore good enough to allow reliable comparisons with its human counterpart (chromosome 21). A chimpanzee chromosome provides a unique angle from which to look at the human genome and to draw conclusions about its recent evolution, because the sequences of these evolutionary near-neighbours started drifting apart some six million years ago. The longer-term hope, of course, is to identify those sequence changes that could account for the present-day physical, physiological and behavioural differences between chimps and people.

By lining up chimp chromosome 22 and human chromosome 21 and comparing them nucleotide by nucleotide, the consortium found instances in which one nucleotide was substituted for another in only about 1.44% of the sequence. The chimpanzee chromosome has been sequenced to an accuracy of less than one error in 104 bases, so sequencing mistakes account for less than 1% of the observed single-nucleotide mismatches. There is also an impressive number (68,000) of small to large stretches of DNA that have been either gained or lost (these are called “insertions or deletions”, “indels” for short) in one species or the other.

The number of single-nucleotide substitutions is in the range found in earlier studies, but the frequency and size of the indels are more of a surprise. Although most of the indels are less than 30 nucleotides long, some attain sizes of up to 54,000 nucleotides. Those of about 300 nucleotides or more frequently involve transposable elements — DNA sequences that multiply and insert new copies of themselves throughout a genome. For a subset of these 300-nucleotide-plus indels, the authors were able to extend the comparison to other great apes: gorillas and orang-utans. They could thereby infer the lineage (chimp or human) in which the alterations occurred, and could distinguish between insertions and deletions — that is, whether a given sequence was added in one lineage or deleted in the other. These comparisons show that insertions of about 300 nucleotides, mainly of the type of transposable element known as an Alu repeat, have occurred preferentially in the human lineage. Deletions and other insertions seem to have occurred at similar frequencies in both lineages.

One of the strongest arguments in support of the chimpanzee genome project was always that having the chimp sequence makes it possible to determine which variants of single nucleotide polymorphisms (SNPs) — single nucleotide differences between individual humans — represent the “original” form4. This information is important in, for instance, genetic association studies aimed at mapping the locations of gene variants associated with complex diseases such as diabetes or high blood pressure. On the basis of the chimp chromosome 22 sequence, the consortium determined the ancestral form of some 20,000 SNPs from human chromosome 21 (although, given that most of the chromosome 22 sequence has come from just one chimpanzee, it remains formally possible that some of the same polymorphisms also occur in chimp populations). The comparison shows that, as expected, transitions (mutation of one purine nucleotide, adenine or guanine, to the other, or of one pyrimidine nucleotide, cytosine or thymine, to the other) are more frequent SNPs than are transversions (mutations of a purine to a pyrimidine and vice versa). Moreover, mutation occurs at guanines and cytosines more frequently than at adenines and thymines.

In searching for the basis of the physical variation between chimps and humans, differences in genome sequences are just the first place to start: we then need to know what these differences mean. Many of the sequence variations might have no effect at all. Those that do might occur in non-protein-coding parts of the genome or in control genes, thereby influencing the level, location and timing of gene expression. Other changes might alter the sequences of encoded proteins, resulting in loss or gain of function. And entire genes might be deleted or acquired in one lineage or another.

Given the broad similarities between chimps and humans, many researchers thought that changes that alter amino-acid sequences would not be very frequent. Surprisingly, however, the consortium found that sequence differences in the protein-coding regions of genes are not a great deal less common than in non-coding genomic regions. But some of the affected genes might be pseudogenes — defective copies of functional genes — that have arisen recently. And, among 231 presumably functional genes that could be compared between chimps and humans, 179 have protein-coding regions of identical length; 140 of the predicted encoded proteins would differ by one amino acid or more, but probably with little or no functional impact. Of the other 52 genes, however, 47 show more significant structural changes.

The consortium could not resist making preliminary studies of the expression of the genes on human chromosome 21 and chimp chromosome 22 as well. Their analyses indicate that — looking at just two tissues — about 20% of these genes show significant variations in their expression. Extrapolation from these findings suggests that if this chromosome represents about 1% of mammalian genes, there may well be thousands of genes that either encode an altered protein or are expressed differentially in humans and chimpanzees. This will not simplify the search for the hypothetical key genetic changes that prevented us from remaining as apes.

Even if the major physical, physiological and behavioural differences between the two species do not result simply from an accumulation of many small alterations, the challenge to find the most crucial changes is still ahead. For example, the FOXP2 gene product, which is important for language development, differs by two amino acids in humans and chimps, suggesting that the gene has been a target of selection in the human lineage. Yet the role of this gene in language was suggested not by human–chimp comparisons5 but by mutation studies in humans6.

Identifying sequence changes in the chimpanzee lineage that are likely to have been irrelevant to the acquisition of human-specific traits will depend on sequence comparisons with other great apes. Do we now need the gorilla genome sequence to shed more light on the questions raised by comparing human and chimp DNA?