Introduction

In the past two decades, many mutations, including point mutations, insertions, deletions, and rearrangements, have been detected in protein-coding genes as well as tRNA and rRNA genes in mitochondrial DNA (mtDNA), and were suggested to be pathogenic (Wallace et al. 1999; DiMauro and Schon 2001; Chinnery and Schon 2003). However, among these pathogenic mutations (http://www.mitomap.org), it is rare to find a mutation occurring in the conserved initiation codon of coding genes. Since the change of initiator amino acid from methionine to another residue would hamper protein translation, it was natural to regard such a mutation as pathogenic. Hitherto, only three mtDNA mutations have been reported as occurring in the translational initiation codon: T3308C in the ND1 gene (Campos et al. 1997), T7587C in the COX II gene (Clark et al. 1999), and A8527G in the ATP6 gene (Dubot et al. 2004). Mutation T7587C was identified in a family with mitochondrial encephalomyopathy, and suggested to affect translation of COX II (Clark et al. 1999). Mutation T3308C was originally reported to be associated with bilateral striatal necrosis and MELAS (mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes) syndrome (Campos et al. 1997), however, it was later proved to neither impair synthesis of ND1 polypeptide nor affect the activity of complex I, thus suggesting that T3308C might be a benign mutation (Vilarinho et al. 1999; Fernandez-Moreno et al. 2000). Rocha et al. (1999) reached the same conclusion by a phylogenetic analysis. All the mtDNAs harboring T3308C were classified into a particular haplogroup, L1b, which is widely distributed in current African and Iberian populations. The third mutation, A8527G, was recently identified in apparent normal individuals. Although the mutation disrupts the ATG initiation codon of the ATP6 gene, GTG, thus generated, usually encodes valine, may serve as a translational initiation codon in mitochondria (Dubot et al. 2004). The different effects of these mutations in respective cases raise the possibility of some unknown remediation pathway(s). Since the third codon in ND1 is ATG, one of the plausible explanations is that the third codon may serve as an alternative initiation codon to produce similar, but two amino acid short, polypeptide (Rocha et al. 1999; Fernandez-Moreno et al. 2000). Additional mutations and variations occurring in the initiation codon of mtDNA genes in human and other species would clarify the mechanism.

In this study, we report a homoplasmic nucleotide change T12338C in mtDNA, which occurs in the initiation codon of the ND5 gene and substitutes methionine with threonine (M1T). The nucleotide change was originally detected in two mtDNAs (GD7809 and QD8147) when we systematically surveyed mtDNAs belonging to all the major haplogroups specific to East Asian by complete mtDNA sequencing (Kong et al. 2003b). To substantiate whether the T12338C change is specific to haplogroup F2 and to learn more about the origin of haplogroup F2, we performed an extensive search for F2 in more than 3,000 Chinese mtDNAs in reported studies (Yao et al. 2000, 2002a,c, 2003; Tsai et al. 2001; Kivisild et al. 2002; Oota et al. 2002; Yao and Zhang 2002; Kong et al. 2003a; Tajima et al. 2003) and our unpublished data by motif-searching and/or (near-)matching methods (Yao et al. 2002a, 2003). Our phylogeographic study revealed that the T12338C change was specific to haplogroup F2 and occurred in normal individuals across China, thus suggesting a polymorphic change rather than a pathogenic mutation.

Material and methods

Sampling

A total of 1,494 subjects from 28 populations across China were screened in this study. All of the individuals were confirmed to be unrelated before sampling and were given informed consent. To better understand the phylogeny of haplogroup F2, the previously reported data sets (Yao et al. 2000, 2002a,c, 2003; Tsai et al. 2001; Kivisild et al. 2002; Oota et al. 2002; Yao and Zhang 2002; Kong et al. 2003a; Tajima et al. 2003) were also included. As a result, a total of 3,090 mtDNAs from 57 populations across China were examined, and their detailed information was illustrated in Table 1.

Table 1 Frequency of mtDNA haplogroup F2 in Chinese ethnic populations

DNA amplification and sequencing

The hypervariable segments I (HVS-I) and II (HVS-II) of mtDNA control region as well as the regions including potential variations were amplified and sequenced as described elsewhere (Yao et al. 2002a; Kong et al. 2003a,b).

Data analyses

The mtDNA sequences were edited and aligned by the DNAStar software package. Mutations were scored according to the revised Cambridge reference sequence (rCRS; Andrews et al. 1999). Length polymorphisms of A and/or C stretches in region 16180–16193 in HVS-I and region 303–315 in HVS-II were disregarded in the subsequent analysis. Each mtDNA was tentatively assigned to a haplogroup on the basis of the variations in the HVS-I and II control regions. The haplogroup status was further confirmed by detecting additional variations in other regions as described in our previous studies (Yao et al. 2002a, 2003; Kong et al. 2003a). A segment covering region 10171–10659 of the rCRS, which was suggested to be informative in defining East Asian specific haplogroups (Yao et al. 2002a), was adopted to specify the phylogenetic status of the F* or R9* mtDNAs (the asterisk attached to haplogroups indicates that the sample was not able to be further classified into the sub-clade(s) of the haplogroup) in our previous studies (Yao et al. 2000, 2002a,c, 2003; Yao and Zhang 2002; Kong et al. 2003a) and unpublished data. For those published data sets (not from our laboratory) with only HVS-I and/or HVS-II information available, we recognized the potential F2 types by matching and/or near-matching with the identified F2 types that have been tested for coding region information. To better understand the relationships among haplogroups, a network profile of haplogroup F2 was constructed according to Bandelt et al. (2000). We also estimated the haplotype diversity and nucleotide diversity (Nei 1987) of haplogroup F2 by using the DnaSP package (Rozas and Rozas 1999).

Results

Identification of mtDNA haplogroup F2

As had been revealed by our recent study (Kong et al. 2003b), all the characteristic nucleotide variations for haplogroup F2 were located in the coding regions. Thus, it is generally hard to select F2 mtDNAs solely on the basis of sequence information from the hypervariable segments in the control region, but F2a is easily recognized by the variation at the 16291 position from the other F haplogroups and R9* (Yao et al. 2002a). By sequencing of the segment 10171–10659, it is easy to distinguish the haplogroup F2 from the rests by nucleotide variations at 10310, 10535, and 10586. As a result, a total of 76 haplogroup F2 mtDNAs were identified (Table 2). Comparing with 19 other mtDNAs belonging to other haplogroups (Table 3), several features are discernible. (1) Five mtDNAs with the variations at both 10310 and 10609 were assigned to haplogroup F1, and their haplogroup status was further confirmed by the variation at 12406. Two of these had motif 16302–16304–16497–249d, and three others were with motif 16172–16304–249d. (2) Ten mtDNAs bearing only the variation at 10310 in segment 10171–10659 was assigned to (an) unidentified lineage(s) in F. Seven of them had motif 16207–16304–16399–146–249d, and three others had motif 16218–16304–16311–249d. Further analysis showed that the seven mtDNAs with motif 16207–16304–16399–146–249d were characterized by two specific variations at 12396 and 12408, hence belonging to a new haplogroup designated as “F4.” (3) The last four samples (two with motif 16157–16256–16304–16335–236–249d and the other two with motif 16304–16362) were found not to have any variations characteristic to F1, F2, F3 (Kong et al. 2003b), or F4, thus belonging to a new lineage (viz., pre-F) in haplogroup R9. Additional information is needed to further specify the phylogenetic positions of these samples.

Table 2 Sequence variations in 76 mtDNAs of haplogroup F2. When the analyzed sequences were identical to the revised Cambridge reference sequence, items are indicated with CRS. When (a) nucleotide change(s) was detected compared to the CRS sequence, only the number of position is indicated for transition, the number with a suffix (i.e. A, C, G, and T) for transversion, with “d” for deletion and with “+” for insertion. When sequence information was not available, items leave blank
Table 3 Sequence variations in non-F2 type mtDNAs. When the analyzed sequence was identical to the revised Cambridge reference sequence, items are indicated with CRS. When (a) nucleotide change(s) was detected compared to the CRS sequence, only the number of position is indicated for transition, the number with a suffix (i.e. A, C, G, and T) for transversion, with “d” for deletion and with “+” for insertion. When sequence information was not available, items leave blank

Phylogeny of the haplogroup F2

The network profile of haplogroup F2 revealed that this haplogroup was divided into three major branches, designated as “F2a” (defined by the variation at 16291; Yao et al. 2002a), “F2b” (recognizable by the variation at 10810; Table 4), and “F2c” (characterized by the variation at 10265) (Fig. 1). Haplogroup F2a comprises three major clades, F2a1 (recognizable by the variation at 16266), F2a2 (characterized by a transversion (T/A) at 16092), and F2a3 (defined by the variation at 16203). It is evident that the sub-clades of F2 show regional distribution. For instance, most of the F2b and F2a3 types, as well as all the F2a1 types, are confined to north or north-origin populations. In contrast, individuals belonging to haplogroup F2c are prevalent in south China.

Table 4 Relevant nucleotide variations in major sub-clades of haplogroup F2. Only varied nucleotides are indicated with a letter while the identical nucleotide to rCRS is indicated with dots. When information for nucleotide sequences is not available, the items have been left blank
Fig. 1
figure 1

Network profile of haplogroup F2 samples observed in Chinese. The network is constructed on the basis of 76 F2 mtDNAs identified in 57 Chinese ethnic populations. As the nucleotide at site 16519 is known to be extremely hypervariable, it is disregarded in the construction. Representative sample names are indicated in the circles, and the circle size is proportional to the number of samples to be belonged among the pool. Relevant nucleotide variations compared with the revised Cambridge Reference Sequence (rCRS) indicate on the branches in the number of the position. Recurrent mutations are underlined, and the asterisk indicates the root of haplogroup F2

T12338C is characteristic of haplogroup F2

Our previous analyses of the major haplogroups in East Asian by using complete sequences revealed that the T12338C variation is exclusively detected in two samples of haplogroup F2 (GD7809 and QD8147; Kong et al. 2003b). Extensive search with more than 1,000 published complete mtDNA sequences (Ingman et al. 2000; Finnilä et al. 2001; Maca-Meyer et al. 2001; Torroni et al. 2001; Derbeneva et al. 2002; Herrnstadt et al. 2002; Kivisild et al. 2002 and references therein; Ingman and Gyllensten 2003; Mishmar et al. 2003) revealed two additional cases with the T12338C variation, with one in a P type (Ingman and Gyllensten 2003) and another in a H1 type (Herrnstadt et al. 2002). To further clarify whether the T12338C variation is characteristic of haplogroup F2 or not, 32 samples were selected from our collection described in Table 2 so as to include at least one sample from respective sub-clades, and subjected to PCR amplification and subsequent direct sequencing for detecting the variation. As a control, 14 samples belonging to non-F2 haplogroups selected from the samples listed in Table 3 were also analyzed. Our results revealed that the T12338C nucleotide substitution was detected in all the F2 samples analyzed but not in any non-F2 samples (Tables 2, 3). Further analyses showed that the T12338C variation was completely linked with the T1005C, T1824C, A7828G, T10535C, G10586A, and G13708A variations (Table 4), thus showing T12338C is one of the characteristic variations specific to haplogroup F2.

Discussion

In this study, we report a homoplasmic nucleotide change, T12338C, which results in substitution of highly conserved methionine at the translation start site of the ND5 gene with threonine (M1T). The T12338C change is tightly associated with other nucleotide variations including T1005C, T1824C, A7828G, T10535C, G10586A, and G13708A. Thus, these variations together characterize the F2 haplogroup. Considering the fact that mitochondria of haplogroup F2 was distributed widely in normal populations across China though with relatively low frequencies (Table 1), and no evidence suggesting any association of the haplogroup with mitochondrial disorders, it is clear that T12338C is a polymorphic variation rather than a pathogenic mutation. This case is similar to that of T3308C, which is specific to haplogroup L1b and has been proven to be benign (Rocha et al. 1999; Fernandez-Moreno et al. 2000). Intriguingly, since the third codon of the ND5 gene also encodes methionine, similar to that of the ND1 gene, it is plausible that the third codon of the ND1 and ND5 genes would act as a surrogate when the initiation codon was impaired (Rocha et al. 1999; Fernandez-Moreno et al. 2000). In the COX II gene, the second methionine codon in frame is located 48 nucleotides downstream of the translational initiation site. Even though this codon could take a similar proxy role as discussed for the ND1 and ND5 genes, it is still uncertain whether the resultant polypeptide with 16 residues shorter is functional. Thus, it is not surprising that a similar phenomenon caused by T7587C in the COX II gene leads to a mitochondrial disorder (Clark et al. 1999). Nucleotide changes occurring in the translational initiation codon are also observed in mtDNAs of primates (Table 5). Two and four cases have been reported for the ND1 and ND5 genes, respectively, in chimpanzees (Horai et al. 1995), macaques, and patas (Hayasaka et al. 1996). It seems that the first codon in proposed cDNA sequence for the ND1 and ND5 genes are not crucial, and the adjacent downstream ATG in frame could take a role in translational initiation.

Table 5 Mutations and/or variations occurring in the initiation codon of mitochondria genes. NA, not available

Haplogroup is normally of higher prevalence and has more genetic diversity at the place it occurred than the radiated area. To speculate where the F2 haplogroup (viz., T12338C) originated, we simply aggregated the data form respective populations into northern and southern groups according to their locations and/or ethono-origin (we did not include samples WH Han and QJ Han because they were located in the Changjiang River region where populations migrate considerably). Although this strategy has been criticized (Yao et al. 2002a), this method would enlarge the sample size and make the comparison less affected by the sample size bias. A slightly higher frequency of haplogroup F2 was identified in the northern group (45/1,473) than in the southern group (27/1,524). The haplogroup diversity and nucleotide diversity in F2 are slightly higher in the northern group (0.940±0.021 and 0.0106±0.0006, respectively) than those in the southern group (0.915±0.033 and 0.0075±0.0008, respectively). These results suggest that the F2 haplogroup might originate in north China. This suggestion is also supported by two features in the phylogeographic analysis of F2 (Fig. 1). (1) The potential root types of F2 were more prevalent in the populations from north China or of the northern origin. (2) Almost all the major sub-clades of F2 were distributed in the northern populations except for F2c (which is exclusive to south China). By counting the transitions in region 16090–16365 (Forster et al. 1996; Saillard et al. 2000), we calculated the age of haplogroup F2 to be 41,700±13,700 years. These results suggested that haplogroup F2 might originate and expand in north China before the last Glacier Age. The prevalence of sub-haplogroups of F2, for examples F2c and F2a2, in south China might reflect back-migration events from north China to south China afterwards (Yao et al. 2002a).

Our current study also raises a concern in identifying pathogenic mtDNA mutations. Several surveys have revealed that mutation C5178A is associated with longevity and other disorders (Kokaze et al. 2003 and references therein). According to our study, the C5178A variation is exclusive associated with haplogroup D (Yao et al. 2002b), and it is well known that haplogroup D is prevalent in northern Chinese (Yao et al. 2002a; Kong et al. 2003a) and Japanese (Maruyama et al. 2003). Thus, the most results of association of C5178A with, i.e., a disease phenotype would rather reflect the existence of population stratification (Ardlie et al. 2002) and/or inadequate sampling. Another paper reporting an association of G15497A with obesity (Okura et al. 2003) comprises a similar case, since G15497A together with T8200C and G15323A are characteristic to haplogroup G1, a prevalent form in northeastern Asia (Bandelt et al. 2003; Kong et al. 2003b). Such a hasty conclusion in association studies between nucleotide changes in mtDNA with disorders could be avoidable if the authors should refer to phylogenetic information of mtDNA.

In conclusion, our mutational and phylogeographical analyses with human mtDNA indicate that the T12338C change is one of the characteristic variations associated with haplogroup F2, thus polymorphic, and not a pathogenic mutation.