The Human Genome Project, from one perspective, began in 1981 with the publication1 of the complete sequence of human mitochondrial DNA (mtDNA). The Cambridge reference sequence (CRS), as it is now designated, continues to be indispensable for studies of human evolution, population genetics and mitochondrial diseases. It has been recognized for some time, however, that the CRS differs at several sites from the mtDNA sequences obtained by other investigators2,3. These discrepancies may reflect either true errors in the original sequencing analysis or rare polymorphisms in the CRS mtDNA. A further complication is that the original mtDNA sequence was principally derived from a single individual of European descent, although it also contained some sequences from both HeLa and bovine mtDNA (ref. 1). To resolve these uncertainties, we have completely resequenced the original placental mtDNA sample.

The results of the resequencing confirm that there are both errors and rare polymorphisms in the CRS (Table 1). There are 11 nucleotide positions at which the original sequence contains the incorrect nucleotide (all sequences here refer to that of the CRS mtDNA L-strand), one of which involves the CC doublet at positions 3,106 and 3,107 in the original sequence, which is actually a single cytosine residue. The other errors are mistakes in the identification of single base pairs and typically involve the incorrect assignment of a guanine residue. Errors at nt 14,272 and 14,365 result from the use of the bovine mtDNA sequence at ambiguous sites1. There are an additional seven nucleotide positions at which the original CRS is correct and which represent rare (or even private) polymorphic alleles. Six of these polymorphisms are single base pair substitutions, although one involves the simple repeat of cytosine residues at nt 311–315. The CRS has five residues in this repeat, whereas most other human mtDNAs have six.

Table 1 Reanalysis of the Cambridge reference sequence

To determine which errors in the CRS resulted from the use of HeLa mtDNA sequence, we sequenced the HeLa mitochondrial genome in its entirety (data not shown). It is of African origin and divergent from the published CRS at a number of sites. The only error that we were able to explain using the HeLa sequence was that at nt 14,766 (T versus C, respectively).

The revised CRS mtDNA belongs to European haplogroup H on the basis of the cytosine at nt 14,766 (instead of the thymine in the original CRS) and the cytosine at nt 7,028 (instead of a thymine4,5,6). The assignment of the revised CRS to haplogroup H is confirmed by the absence of any of the predicted restriction site changes that characterize the other European mtDNA haplogroups4,6.

If we include both the technical errors (8) and the errors (3) introduced by the assumption of the bovine and HeLa mtDNA sequences, the overall error frequency for the original CRS analysis was 0.07%, emphasizing the remarkable tour de force by Sanger and colleagues. Nevertheless, because of its widespread use, we recommend that the CRS for human mtDNA should be revised as follows: (i) the ten simple substitution errors should be corrected; (ii) the rare polymorphic alleles should be retained (that is, the revised CRS (RCRS) should be a true reference sequence and not a consensus sequence); and (iii) the original nucleotide numbering should be retained. The last suggestion represents a compromise between accuracy (correcting the numbering to account for the single C residue at nt 3,106 and 3,107) and consistency with the previous literature. We believe that renumbering all of the previously identified sequence changes beyond nt 3,106 would create an unacceptable level of confusion.