Introduction

Cytoplasmic male sterility (CMS) is a common phenomenon in higher plants and is the result of incompatibility between nuclear and cytoplasmic gene products, which results in the failure of sporogenesis (Newton, 1988; Levings and Brown, 1989). The combination of CMS and a nuclear gene for restoration of fertility (Rf) are essential in self-fertilizing crop species such as rice for breeding hybrid varieties and for hybrid seed production.

In rice, the ‘BT-type’ of CMS occurs when the cytoplasm of Chinsurah Boro II (indica) is combined with the nuclear genome of Taichung 65 (japonica). The gene Rf-1, initially identified in Chinsurah Boro II, can restore fertility in plants with BT-type CMS (Shinjyo, 1975; Shinjyo, 1984). Rf-1 was recently shown to encode a mitochondrial targeting, pentatricopeptide (PPR) protein (Kazama and Toriyama, 2003; Akagi et al., 2004; Komori et al., 2004). The Rf-1 gene product is involved in controlling the expression of CMS by processing an abnormal chimeric transcript generated by a mitochondrial open reading frame, orf79, downstream of mitochondrial atp6 (Iwabuchi et al., 1993; Akagi et al., 1994; Kazama and Toriyama, 2003; Wang et al., 2006).

The Rf-1 locus is complex and contains several duplicated copies of the Rf-1 gene. In addition, the number of duplicated genes in the Rf-1 locus also varies in different rice lines (Kazama and Toriyama, 2003; Akagi et al., 2004; Komori et al., 2004). The complex nature of the Rf-1 locus may have been generated by gene duplication, and the functions of the duplicated gene may have diverged during rice evolution. The classical Rf-1 locus consists of two closely linked genes, Rf-1A and Rf-1b. Both genes can recover Bt-type CMS in the different manners (Wang et al., 2006). Duplicated PPR genes have also been identified in the Rf locus region in Petunia and Brassica (Bentolila et al., 2002; Brown et al., 2003; Koizuka et al., 2003). Such duplication and mutation of Rf genes encoding PPR proteins is thought to be one of the strategies by which plants acquire new Rf gene functions (Shikanai, 2006).

Rice is one of the most important crops and feeds about 40% of the world population. Hybrid rice shows heterosis and gives higher yields (Yuan, 1994; Fujimura et al., 1996). The duplicated genes in the Rf-1 locus may possibly play roles in restoration of CMS by controlling mitochondrial gene expression. Thus, the Rf-1 locus is important for rice breeding, particularly for the production of hybrid rice, and consequently, it is necessary to clarify the functions of the Rf-1 genes.

The genus Oryza is divided into four species complexes and two discrete species that evolved from a common ancestor (Wang et al., 1992; Aggarwal et al., 1999). The Oryza sativa complex contains all of the AA genome species, including two cultivated species, O. sativa and O. glaberrima, and five wild species, O. rufipogon, O. barthii, O. glumaepatula, O. longistaminata and O. meridionalis (Vaughan and Morishima, 2003). The two cultivated species, O. sativa and O. glaberrima, were independently domesticated from O. rufipogon in Asia and O. barthii in Africa, respectively (Oka, 1988; Second, 1991). The evolutionary relationships between species in the genus Oryza have been thoroughly analyzed by comparison of their DNAs (Wang et al., 1992; Ishii et al., 1996; Cheng et al., 2002). Since the complex structure of the Rf-1 locus may be derived from a single ancestral gene, genomic analysis of the locus in a range of rice lines should provide insight into the evolution of Rf-1.

In this study, we analyzed allelic variants of the Rf-1 locus in AA genome species of the genus Oryza that had been collected from locations with a wide geographic distribution from Asia to Africa. A PCR analysis of Rf-1 locus sequences revealed a high diversification of the structure of the Rf-1 locus in these species. The ancestor of the Rf-1 gene may have been duplicated early in rice evolution, and subsequent recombination may have created the present diversified structure of the Rf-1 locus.

Materials and methods

Plant materials

A total of 96 accessions of the genus Oryza were used in this study (see Table 2). The 63 accessions of the cultivated rice species Oryza sativa (AA) included 53 local varieties and 10 modern cultivars. The local varieties had been collected in Asia and were classified into three subspecies, indica (I:29 lines), javanica (V:tropic-japonica; 14 lines) and japonica (J:temperate-japonica; 10 lines) (Morishima and Oka, 1981). Six lines of the other cultivated species O. glaberrima were used. A total of 27 lines of wild rice were also used, including 4 lines of O. rufipogon (AA), 1 line of O. barthii (AA), 2 lines of O. longistaminata (AA), 2 lines of O. glumaepatula (AA), 2 lines of O. meridionalis (AA), 2 lines of O. punctata (BBCC), 2 lines of O. minuta (BBCC), 1 line of O. officinalis (CC), 2 lines of latifolia (CCDD), 2 lines of O. grandiglumis (CCDD), 2 lines of O. alta (CCDD), 2 lines of O. brachyantha (FF), 1 line of O. longiglumis (HHJJ), 1 line of O. granulata (GG) and 1 line of O. meyeriana (GG). With the exception of the 10 modern cultivars, rice lines were kindly provided by the National Institute of Genetics. The accession numbers of the original collections were used in this study.

Crude DNA extraction

Leaf tips (1 cm) were collected from young rice seedlings using 2 ml Eppendorf tubes. Leaves were completely dried at 70°C for 2 h and were, then, ground until it became powder with a stainless ball (ϕ3mm) in a vibrating 2 ml Eppendorf tube using a Micro Smash (MS-100 TOMY). Extraction buffer (1 ml; Edwards et al., 1991) was added to the leaf powder and the mixture was incubated at room temperature for 1 h. After centrifugation at 10 000 r.p.m. for 3 min, the supernatant was collected. Crude DNA was precipitated from the supernatant by adding an equal volume of 2-propanol.

PCR

PCR was performed with primers specific for the Rf-1 locus (Table 1, Figure 1a) using TAKARA Ex Taq (Takara Bio Inc., Shiga, Japan) or TAKARA LA Taq with GC I buffer (Takara Bio Inc.). PCR was carried out in a 20 μl of reaction volume consisting of 1 U of TAKARA Ex Taq or LA Taq with the accompanying buffer, 4 nmol dNTP, 10 pmol of each set of primers and 10 ng of crude DNA using a Thermal Cycler 9600 or 9700 (Perkin-Elmer, Foster City, CA, USA). Thirty-five PCR cycles, each consisting of 10 s of denaturation at 94°C, 30 s of annealing at 55°C and 2 or 3 min of polymerization at 72°C, were performed using a Thermal Cycler 9600 or 9700 (Perkin-Elmer). The PCR products were mixed with bromophenol blue loading dye and were analyzed by electrophoresis on 1% agarose gels (Invitrogen, Calsbad, CA, USA) using 1 × TBE (tris-borate-ethylenediamine tetraacetic acid) buffer at room temperature.

Table 1 The primer sequences used here
Figure 1
figure 1

PCR determination of the structure of the Rf-1 locus in the genus Oryza using various combinations of primer pairs. (a) The structure of the Rf-1 locus of IR24 (Komori et al., 2004) is shown, and the arrows indicate the positions and direction of the primers used here. The region specific for each duplicated gene was represented by different patterns (see Figure 3). Directions of the duplicated genes are represented by arrows above the genes. (b) Examples of amplification profiles of 24 lines out of the 96 lines using pairs of the primers in (a) are shown. T0419 to T0729 are the accession names of 24 lines (see Table 2). The PCR products were electrophoresed on 1% agarose gels and then stained with ethidium bromide. DNA size-marker lanes contain λ/StyI. Approximate sizes of the amplicons are also indicated.

Sequence analysis

The nucleotide sequences of the amplicons were determined by direct sequence analysis. After DNA amplification, PCR products were purified using a PCR Purification kit (Qiagen, Hilden, Germany). Purified fragments were sequenced with primers specific for the Rf-1 locus by using BigDye Terminator Cycle Sequencing v1.1 Ready Reaction Kit (Applied Biosystems Inc., Foster City, CA, USA), and nucleotide sequences were determined using a DNA sequencing system (ABI 377, Applied Biosystems Inc.). DNA sequences were analyzed using GENETYX-Mac (Software Development Co., Tokyo, Japan). Nucleotide sequences of DNA fragments corresponding to duplicated genes of the Rf-1 gene were aligned using the ClustalW program available from the web site of the DNA Data Bank of Japan (DDBJ) and a phylogenetic tree was produced using TREE VIEW software (Page, 1996).

Results and discussion

PCR analysis of the Rf-1 locus of the genus Oryza

The structure of the Rf-1 locus was determined by PCR, using primers specific for the Rf-1 locus (Figure 1a, Table 2), in 69 lines from 2 cultivated Oryza species and 27 lines from 15 wild species.

Table 2 Summary of PCR amplification using primers specific for the Rf-1 locus

PCR products of 2.3 and 2.8 kb, encompassing the entire Rf-1A gene, were amplified with primers ‘a’ and ‘c’ from the 67 lines of cultivated rice (Figure 1b, Table 2). These two lengths of amplicons corresponded to the truncated and complete Rf-1A gene, respectively (Akagi et al., 2004). The 3′ end of the Rf-1A gene was amplified with primers ‘a’ and ‘b’ in all lines of cultivated species (Table 2). The amplification products from five wild species with an AA genome (O. rufipogon, O. barthii, O. logistaminata, O. glumaepatula and O. meridionalis) indicated that they carried either a complete Rf-1A gene or a substantial part of it (Table 2). This result suggests that the Rf-1A gene is conserved within the AA genome species of the genus Oryza.

Primers ‘f’ and ‘c’ amplified a region corresponding to the Rf-1D gene only in 10 lines of O. sativa, 2 lines of O. rufipogon and 2 lines of O. longistaminata (Figure 1b, Table 2). In these lines, the region between the Rf-1A and Rf-1D genes was also amplified with primer ‘d’ and ’e’ (Table 2). Therefore, the genomic structure from the Rf-1A gene to the Rf-1D gene was conserved in this group. However, the analysis also suggested that the majority of the tested lines did not carry the region that included the Rf-1D gene.

Three types of amplicons of 2.0, 2.4 and 2.9 kb, comprising the Rf-1B gene (this gene is different from the Rf-1b gene (Wang et al., 2006)), amplified in almost all lines with an AA genome; the exceptions were 2 lines of O. sativa and 2 lines of O. meridionalis using the primer combination ‘k’ (which is specific for the region upstream of Rf-1B) and ‘i’ (Figure 1b, Table 2). Furthermore, the region upstream from Rf-1B, which includes the Rf-1C gene, was detected in all lines of O. sativa and O. rufipogon (Figure 1b, Table 2). The African cultivated species, O. glaberrima, had a similar genomic structure as O. sativa, in terms of amplification profile, in 2 of the 5 lines tested (Table 2). The results indicate that the genomic structure upstream of Rf-1B gene was conserved between O. sativa and O. glaberrima. On the other hand, two amplicons of 1.1 and 2.0 kb, comprising the 3′ part of Rf-1B gene, were amplified only in 28 lines of O. sativa and 1 line of O. rufipogon using the primers ‘j’ and ‘l’ (Table 2). This suggested that the genomic structure or nucleotide sequence downstream of the Rf-1B gene was not conserved in the genus Oryza.

As genomic structures downstream of Rf-1B were unclear in several lines, we investigated these using primer ‘k’ (which is specific for the region upstream of Rf-1B) in combination with primer ‘f’ (which is specific for region downstream Rf-1D). Surprisingly, two amplicons of 2.9 and 3.5 kb were amplified from 26 lines of the AA genome species (Figure 1b, Table 2). This suggested that these lines carried a chimeric gene containing the 5′ region of Rf-1B and the 3′ region of Rf-1D.

Since no amplicon was observed in the wild species belonging to the BB and CC genome species with the primer combinations in Table 2, the target sequences of the primers may not be conserved in these wild species.

Nucleotide sequences of the amplicons

Sequence analysis showed that the amplicons contained Rf-1A, Rf-1B, Rf-1C or Rf-1D, and their flanking sequences (data not shown). Thus, the primer combinations used in the PCR specifically amplified their target regions in the Rf-1 locus.

Two amplicons of 2.9 and 3.5 kb (Figure 1b, Table 2) were obtained using the primer pair ‘f’ and ‘k’. These amplicons contained the flanking regions upstream of Rf-1B and downstream of Rf-1D. However, these amplicons contained sequences that differed from both the Rf-1B and Rf-1D genes. We, therefore, named the duplicated genes in the 2.9 and 3.5 kb fragments as Rf-1E and Rf-1F, respectively.

The nucleotide sequences of the duplicated Rf-1 genes were compared using ClustalW (Figure 2). Phylogenetic analysis indicated that the duplicated genes were closely related (Figure 2), suggesting that the genes were generated by tandem duplication. The newly identified Rf-1E and Rf-1F genes belonged to the phylogenetic clade that contained Rf-1D (Figure 2).

Figure 2
figure 2

Phylogenetic tree of the duplicated genes of the Rf-1 locus in the AA genome species of the genus Oryza. Nucleotide sequences of Rf-1F of T0437 and Rf-1E of C0501 were compared with those of the Rf-1A, Rf-1B, Rf-1C and Rf-1D genes from MTC-10R. To investigate phylogenic relationships of the Rf-1C and Rf-1D gene from O. rufipogon, nucleotide sequences of Rf-1C and Rf-1D of O. rufipogon (W0120, W1294) were also aligned with these sequences using the ClustalW program. A phylogenetic tree was created from these distances with the TreeView program using the neighbor-joining method. Bootstrap values are shown on the tree. The position of the amplicon corresponding to the Rf-1A gene from O. glumaepatule is also indicated as Rf (W2199).

The nucleotide sequence of the complete Rf-1A gene of O. glaberrima (C0501) was identical to that of O. sativa (T0041, MTC-10R, IR24, Zhen-Shan 97). The Rf-1E gene showed 99.9% conservation at the nucleotide level and 100% at the amino acid level in these two cultivated species (between C0501 and Zhen-Shan 97). These results showed that the duplicated genes at the Rf-1 locus were conserved in O. sativa and O. glaberrima, which were independently domesticated in Asia and Africa, respectively. Thus, Rf-1 gene duplication appears to have predated the divergence of ancestral wild species of O. sativa and O. glaberrima, and their nucleotide sequences have been conserved since this divergence.

Structural diversification of the Rf-1 locus in the Asian cultivated species, O. sativa

Based on the PCR and sequence analyses, the Rf-1 locus was classified into six structural types, named here Types I–VI, (Figure 3). Type I carried Rf-1A, Rf-1D, Rf-1B and Rf-1C. This type has been previously described by Komori et al. (2004). Both Type II and Type III lacked the region containing the Rf-1D gene and, therefore, carried three duplicated genes, Rf-1A, Rf-1B and Rf-1C (Figures 3a and b). In addition, Type III carried truncated Rf-1A and Rf-1B genes (Figure 3b, Akagi et al., 2004). Type IV and Type V carried Rf-1F or Rf-1E, respectively, instead of Rf-1D and Rf-1B (Figures 3c and d). Only one of the rice lines had Type VI, which was characterized by a large deletion leaving only the Rf-1A and Rf-1C genes (Figure 3e). Sixty of the 69 O. sativa lines could be classified into one of these six types (Table 3). However, the structure of the Rf-1 locus in the remaining nine lines could not be determined in this study (Table 3).

Figure 3
figure 3

Genomic structure of the Rf-1 locus in O. sativa. The six types of Rf-1 locus structure are represented here as Types I–VI. The structural features of each genotype are indicated under Type I. The region specific for each duplicated gene was represented by different patterns. Borders of the specific regions were determined by comparison of nucleotide sequences around putative recombination points of Types I–VI. Boxes with gene name represent the position of the duplicated genes. Nucleotide sequence similarities around putative recombination positions between each genotype and Type I are also indicated. Directions of the Rf-1F and Rf-1E gene are indicated by arrow over each gene. Type III carried the truncated Rf-1A and Rf-1B gene as described previously (Akagi et al., 2004).

Table 3 The distribution of the genotypes of the Rf-1 locus among subspecies of local cultivars

Nucleotide sequence conservation was not limited to the duplicated genes but was also present in their flanking regions. It is possible, therefore, that homologous recombination in the 3′ flanking regions of Rf-1D and Rf-1B may have generated Type II from Type I (Figure 3a). Similarly, Type III may have been produced by homologous recombination in the upstream region of a Type I Rf-1A and Rf-1D structure that resulted in the loss of the Rf-1D gene (Figure 3b). In Type VI, the region from the Rf-1D to Rf-1B gene may have been lost by recombination between the 5′ regions of the Rf-1A and Rf-1B genes (Figure 3e). Thus, the highly conserved nucleotide sequences in the duplicated genes and their flanking regions suggest that the complex structure of the Rf-1 locus may have had been generated by homologous recombination (Figure 3).

Evolution of the Rf-1 locus in different subspecies of O. sativa

The accessions of O. sativa used here were collected from widely dispersed geographical sites throughout Asia, and were classified into three subspecies (Morishima and Oka, 1981). It was anticipated that these would cover a large part of the genetic variation present in O. sativa. Most of the indica subspecies were classified as Type V, whereas both japonica and javanica (tropic and temperate japonica) subspecies were mainly classified as Type III (Table 3). Moreover, Type I, Type II and Type VI were found only in indica subspecies (Table 3). Introgression of the Rf-1 locus between the javanica and indica subspecies may have produced the exceptional genotypes in the two lines of javanica and one line of indica that had Type V and Type III loci, respectively (Second, 1982). The consistency of the Rf-1 structure in the different subspecies suggested that the structural variation at the Rf-1 locus may have arisen before the divergence of these subspecies.

Geographical distribution of Rf-1 locus genotypes in Asian cultivars

The geographical distribution of the Rf-1 locus genotypes of O. sativa in Asia is illustrated in Figure 4. It was clear that the genotypes showed geographical variation in their frequencies (Figure 4). Type I was rare and found only in India, whereas Types II–V were distributed throughout Asia (Figure 4). In India, almost all types were represented, the exception being Type VI (Figure 4). India is known to be one of the secondary centers of rice origin (Khush, 1997) and, consequently, the diversity of Rf-1 locus structures may represent the distribution of genetically divergent varieties throughout India. In contrast, Types II and III predominated in Taiwan and Japan, respectively (Figure 4). The local varieties collected in Taiwan and Japan belonged to the indica and japonica subspecies, respectively. Therefore, the geographic distributions of the Rf-1 locus genotypes represent the distribution of subspecies of O. sativa.

Figure 4
figure 4

Geographical distribution of Rf-1 locus genotypes in Asian cultivars. The geographical locations of the genotypes were classified according to the countries or areas in which the local varieties were initially collected. Numbers of accessions for each area are 9 (India), 4 (Vietnam), 9 (Indonesia), 6 (Philippines), 11 (China), 11 (Taiwan) and 3 (Japan). The relative proportions of accessions carrying each genotype, including unclassified genotypes, are indicated.

Genomic structure of the Rf-1 locus in wild species

The Rf-1 locus structures found in O. rufipogon, the wild ancestral species of the Asian cultivar of O. sativa, are shown in Figure 5. The Rf-1 locus of W0120, W1299 and W2003 was classified as Type I, Type II and Type V, respectively (Figure 5a). However, W0120 and W2003 also had a deletion downstream of the Rf-1A gene and partial deletions in both the 5′ and 3′ regions of the Rf-1A gene (Figure 5a). The remaining accession, W1866, carried the Rf-1A, Rf-1D and Rf-1C genes; however, the genomic structure between the Rf-1D and Rf-1C genes was not clarified (Figure 5a).

Figure 5
figure 5

The genomic structure of the Rf-1 locus of the AA genome species. (a) The genotypes of the Rf-1 locus in four accessions of O. rufipogon, the ancestor of the Asian cultivated species, O. sativa are shown. (b) The genotypes of three accessions of the African cultivar, O. glaberrima, and a wild relative, O. barthii are shown. The region specific for each duplicated gene was represented by different patterns (see Figure 3). Dotted lines indicate that structure was unidentified because of no amplicon was amplified.

We also examined the structure of the Rf-1 locus in the African cultivar, O. glaberrima, and its wild relative, O. barthii (Figure 5b). Two lines of O. glaberrima, C0501 and C0650, had Type V structures (Figure 5b). In the remaining four lines, the region upstream of Rf-1E gene, including the Rf-1C gene, was not detected by the PCR analysis. These lines probably carried a variant of Type V (Figure 5b). One line of O. barthii, W0652, gave the same amplification profile as these four lines, suggesting that W0652 also carried the same variant of Type V (Figure 5b).

Evolution of the Rf-1 locus in the AA genome species

The molecular structure of the Rf-1 locus of three AA genome species, O. sativa, O. rufipogon and O. glaberrima, was revealed by both, PCR and nucleotide sequence analysis (Figures 2 and 5). The Rf-1 locus genotypes of these species are indicated in the dendrogram (Figure 6). Type I, Type II and Type V were found in O. rufipogon, an ancestor of O. sativa (Figures 5 and 6), indicating that these three genotypes had formed before the divergence of O. sativa and O. rufipogon (Figure 6). Previous reports have suggested that the japonica and indica subspecies differentiated before their domestication from different O. rufipogon populations (Second, 1982; Wang et al., 1992; Chen et al., 1993). In this study, only four lines of O. rufipogon from three regions (India, Philippines and Thailand) were analyzed. Because O. rufipogon is distributed in many regions of world, there is the possibility that other Rf-1 locus genotypes might be present within O. rufipogon. The Type V genotype was common to both O. sativa and O. glaberrima. The two cultivated species, O. glaberrima and O. sativa, originated from a common AA genome ancestor; they have evolved in parallel from O. barthii in Africa and O. rufipogon in Asia (Morishima et al., 1963; Second, 1982). Thus, Type V must have formed before the divergence of the ancestors of O. sativa and O. glaberrima. These results suggest that duplication of the Rf-1 gene occurred at an early stage of AA genome evolution and that this was followed by structural diversification of the Rf-1 locus.

Figure 6
figure 6

Evolutionary relationships among different Rf-1 locus variants in the genus Oryza. The genomic structure of the Rf-1 locus identified in each taxon, O. sativa, O. rufipogon and O. glaberrima, is indicated below each taxon name. Commonalities between species are highlighted.

Conclusion

We have shown here that the Rf-1 locus has a highly diversified structure and contains several duplicated genes. Our results indicated that duplication of the Rf-1 gene occurred early in rice revolution, and that subsequent diversification produced the complex Rf-1 locus structure found in the AA genome species of the genus Oryza.

The amino acid sequences of the duplicated genes were highly conserved. This outcome is consistent with our previous study in which we found that the Rf-1C gene encoded a PPR protein with 88.6% homology to the Rf-1A protein (Akagi et al., 2004). The Rf-1A gene, but not the Rf-1C gene, can recover BT-type CMS (Akagi et al., 2004). The newly identified Rf-1b recovers BT-type CMS by modification of mRNA from the mitochondrial atp6 gene, but does so in a different manner to the Rf-1A gene (Wang et al., 2006). This indicates that the molecular functions of the duplicated genes have also diversified during rice evolution.

Moreover, the deduced amino acid sequences of the duplicated Rf-1 locus genes were highly conserved across the species barrier in the AA genome species. This suggests that these duplicated genes could be functional. The fertility restorer genes in Petunia and Brassica are similar to those in rice in that they are duplicated and encode a PPR protein (Bentolila et al., 2002, Brown et al., 2003, Koizuka et al., 2003). These duplicated genes are thought to have arisen from a restorer gene in response to the appearance of new forms of CMS during evolution (Brown et al., 2003). The PPR proteins are involved in controlling organelle gene expression by the processing of transcripts (Small and Peeters, 2000). Therefore, the duplicated genes in the Oryza Rf-1 locus may encode PPR proteins that play diverse roles in the regulation of mitochondrial genes, although, currently, the details of their molecular function are unclear.