Recently, Martorell et al1 have analyzed the sequence variation of complete mitochondrial genomes in six schizophrenic patients with an apparent maternal transmission of schizophrenia. According to the authors, the patients showed new mitochondrial DNA variants related to the disease outcome. We here show, by employing a phylogeny-based approach, that the published sequences and the resulting conclusions are affected by serious shortcomings.

The analysis of the mtDNA sequence variation led Martorell et al1 to conclude that some ‘new’ mtDNA mutations observed in the patients may contribute to the complex genetic basis of schizophrenia. The most remarkable finding is that five of the six patients carried the same heteroplasmic missense variant (T12096A). This would be a most astonishing result taking into account that (a) the patients are unrelated, (b) harbor mtDNAs belonging to different haplogroups (see below), and (c) a survey of available data in the literature shows that T12096A was not observed before, neither in 3000 random population samples (complete or coding-region genomes) nor in cohorts of patients.

Such a strong association would indicate a major role of the T12096A mutation in the expression of schizophrenia raising major diagnostic and clinical expectations. However, because of these important implications, other causes that could have created the observed pattern should have been carefully evaluated and excluded before publication. Unfortunately, this does not appear to be the case. For example, a single real occurrence of the T12096A mutation could have spread to other samples via cross-contamination. To exclude such a scenario, several experimental schemes could be designed: for instance, the sequencing of both strands in largely overlapping fragments so that the mutation of interest can be captured together with at least one specific mutation/polymorphism within the same fragment. Further, it is unclear whether the RFLP analysis carried out1 was sufficiently controlled for notorious problems; in particular, the loss of a site can inadvertently be owing to a number of other factors, such as insufficient amount of enzyme, and excess of DNA.2

Similarly remarkable is the statement by Martorell et al1 that they found considerable sequence diversity – as if this was not a normal sign of the naturally occurring mtDNA variation in Spain as elsewhere in Europe.3 The authors claimed to have sequenced the entire mtDNA of their six patients in 24 fragments,4 which, however, overlap only minimally. This would imply that most parts of the sequences were well readable from only one strand. This setup bears the risk of recombination and phantom mutations to occur.5, 6, 7 In particular, position 12 096 could only be captured by the 18th PCR fragment4 and could probably only be well read from the light strand.

It was asserted1 that the variation in the six mtDNA sequences was recorded by comparison with the rCRS.8 This, however, cannot be true: the sequences of patient numbers 1, 2, 3, and 5 would then have to belong to haplogroup H2b, even sharing all the private mutations of the rCRS, although single mutations would each (weakly) point to other subhaplogroups of haplogroup H (H5, H1, H7, and H3, respectively). The sequences of patient numbers 4 and 6 are then more informative and reveal the real causes of this conundrum: massive oversight of mutations.9 Sequence number 4 clearly belongs to haplogroup K1c110 and sequence number 6 is a member haplogroup M1a2a11 and not of haplogroup W as asserted.1 Unfortunately, both sequences lack the majority of mutations as can be inferred from the mtDNA phylogeny (Figure 1).

Figure 1
figure 1

Phylogenetic reconstruction of the mtDNA branches that lead to haplogroup K1c1 and various branches of haplogroup M1, indicating the inferred oversights (circles) that affect the ‘complete’ genomes of patients 4 (P4) and 6 (P6) reported by Martorell et al.1 All mutations are transitions except those suffixed by a nucleotide (describing a transversion), or a plus sign (indicating an insertion) or ‘del’ (in the case of a deletion) or ‘het’ (for heteroplasmy). Parallel mutations are underlined. The order of mutations on each uninterrupted branch section is arbitrary. Sample numbers 1, 25, 28, and 43 are from Olivieri et al;11 sample number 17 is the same sample as number 25 in Torroni et al;22 K-1 and K-2 refer to the mtDNAs analyzed by Kirk et al;16 MM-1 and MM-2 refer to #AF381996 and #AF381984, respectively, from Maca-Meyer et al;12 #AF15 and #AF43 are from Kivisild et al;15 #287 is from Herrnstadt et al.14 The dotted arrows indicate the likely haplogroup allocation of the pointed mtDNAs (note that for these sequences the full mtDNA information was not reported in the original articles14 or the mtDNAs reported bear oversights12). The ethnic/geographic origins of the mtDNA in the figure are as follows: numbers 1, 28 (Italy), 17 (Ethiopia), 25 (Berber of Morocco), 43 (Bedouin of Southern Israel), MM-1 (Jordan), MM-2 (Morocco), AF-15 (Ethiopia), AF-43 (Ethiopia); for the remaining sequences this information is not available. Note that the mutation G8857A has been observed before in other haplogroup contexts.15, 23

Incomplete recording of mutations can also affect systematic sequencing attempts in population studies, where, for example, two earlier published haplogroup M1 sequences lacked a number of variant nucleotides,12 which inevitably led to slightly distorted views of the variation in haplogroup M1.13 In contrast, more recent coding-region data fit perfectly into our M1 tree: one sequence14 is indistinguishable from the root of haplogroup M1a1a, one15 from our sequence number 28 (M1a2b), and another one15 is close to the root of haplogroup M1a1.

The novel data from haplogroup M1 also clarify the role of C12403 T, A12950C, and other mutations: they are not pathogenic but simply characteristic mutations of haplogroup M1 as a whole. These mutations were previously observed in patients with bipolar disorders.16 The cumulative listing of mutations found in those patients16 clearly points to the existence of two M1 sequences in that data set, strongly supported by mutations (each recorded twice) A8701G (for non-N status), C12403T, A12950C, and T14110C (for M1 status). Moreover, these two sequences can be reconstructed as belonging, more specifically, to haplogroups M1a1 (owing to the presence of substitutions A813G, T6671C, and C12346T) and M1b1 (identified by changes C4936T, T13111C, and C15247G), respectively. Again, about half of the variation relative to rCRS must have been overlooked here, too.

One of the principal causes for premature claims of disease association of certain mutations detected by chance in some patients is the ‘low penetrance of phylogenetic knowledge in mitochondrial disease studies'.17 A number of cases have been reanalyzed so far and guidelines have been formulated for avoiding errors.18 Ten years ago, of course, with only very few complete mtDNA sequences published, sequencing of mtDNA was necessarily more hazardous and likely led to massive oversights of mutations, as was, for example, the case with the mtDNA analysis of two (related) schizophrenic patients whose mtDNA fell into haplogroup I2.19 Moreover, genuine haplogroup-specific mutations came into suspicion of being pathogenic, simply because the corresponding haplogroups were not yet fully described at the time. For example, three mutations (T3197C, A14793G, and A15218G) shared by all haplogroup U5a1 mtDNAs were targeted in that study19 as they were found in two (unrelated) patients, and it was then believed that a ‘trend towards a higher frequency of substitutions in the patients' would deserve further attention.

Nowadays, however, there is a rich body of complete mtDNA sequences and data-mining strategies available that should not be ignored.20, 21 Therefore authors of pertinent papers should make sufficient efforts to sequence entire mtDNA genomes correctly and compare them to the available database in order to avoid inadvertent interpretation of erroneous data.