Introduction

Microsatellites consist of tandemly arrayed di-, tri- and tetra-nucleotide repeats. They are hypervariable and are distributed randomly throughout eukaryotic genomes. The large allelic variation in microsatellite loci can be easily detected by PCR using specific primers in both of the flanking regions of microsatellites as a variation in the length of amplicons (Litt & Luty, 1989; Tautz, 1989; Weber & May, 1989). These characteristics of microsatellites make them useful for the construction of high-density genetic maps (Dib et al., 1996; Dietrich et al., 1996) and for evolutionary genetic studies (Bowcock et al., 1994). Microsatellite DNA markers have also been identified in several plant species as aids in breeding (Akkaya et al., 1992; Senior & Heun, 1993; Wu & Tanksley, 1993; Becker & Heun, 1995; Röder et al., 1995; Broun & Tanksley, 1996).

The mechanisms which cause the variation in the length of microsatellites are not well understood. Polymerase slippage has been proposed as a mechanism of microsatellite mutation (Caskey et al., 1992). Furthermore, knowledge regarding the evolutionary origins of microsatellites is limited. It has been shown that a nucleotide substitution creates di- and tetra-nucleotide repeats in primates (Messier et al., 1996). Such mutations may have occurred during the evolution of a given species, resulting in the present variation in microsatellite loci.

The genus Oryza is divided into four species complexes and two discrete species. The O. sativa complex contains all of the A genome species, including the two cultivated species O. sativa and O. glaberrima. Oryza sativa has been classified into two subspecies, indica and japonica, based on characteristic morphological and physiological traits (Oka, 1991). The evolutionary relationship among species in the genus Oryza has been well analysed by chloroplast DNA (Dally & Second, 1990; Chen et al., 1993) and nuclear RFLP (Wang et al., 1992).

Several microsatellite markers have been identified in O. sativa (Wu & Tanksley, 1993; Akagi et al., 1996; Panaud et al., 1996; Akagi et al., 1997). Two pairs of microsatellite markers have been found on duplicate chromosomal segments of chromosomes 11 and 12 (Panaud et al., 1996). Because these microsatellites may have originated from a single ancestral molecule in the evolution of the genus Oryza, they could be useful for understanding the origin and evolution of microsatellites.

In this study, allelic variations were analysed in the A genome species of the genus Oryza of RM20 loci which are located on duplicate chromosomal segments. An ancestral sequence of the microsatellite was found in a wild species by analysing nucleotide sequences of RM20-related amplicons. The origin and evolution of these microsatellites is described here.

Materials and methods

Plant materials

A total of 61 accessions of cultivated rice species O. sativa (AA) including 59 local varieties and two modern cultivars were used. The varieties of O. sativa were classified into indica (30 lines) and japonica (29 lines) subspecies (Morishima & Oka, 1981). The two modern cultivars were Asominori and IR24, which are classified as japonica and indica, respectively. The 59 local varieties are stocked at the National Institute of Genetics and were kindly provided by Dr H. Morishima. One line of another cultivated species [O. glaberrima (AgAg)] was used. Twelve lines of wild rice, including seven lines of O. rufipogon (AA), one line of O. brathii (AgAg), two lines of O. longistaminata (AlAl), one line of O. brachyantha (FF) and one line of O. meyeriana (Diploid) were also used. These lines of wild species are stocked at Tohoku University, and were kindly provided by Dr M. Oka. The lines of O. glaberrima, O. brathii, O. longistaminata and O. brachyantha were collected in Africa, and those of O. sativa and O. rufipogon were collected in Asia. The accession numbers of the original collections were used in this study.

DNA extraction

Crude DNA was extracted from the leaves of young seedlings. After being dried at 70°C for 2 h, leaves were homogenized with small glass beads in Eppendorf tubes using a vortex mixer. Crude DNA was then extracted according to the method of Edwards et al. (1991). Extraction buffer (400 μL) was added to the crushed seeds and the mixture was incubated at 100°C for 10 min. DNA was precipitated from the supernatant by adding an equal volume of isopropanol.

Southern blots

Total DNA, which had been prepared from rice leaves by the CTAB method, was digested with a series of restriction enzymes, then separated on a 0.8% agarose gel. After transfer onto a nylon membrane (Hybond N+, Amersham International), positive bands were detected using a ECL direct nucleic acid labelling and detection system according to the manufacturer's protocol.

Microsatellites

Microsatellites of RM20 which were located on duplicate regions of chromosomes 11 and 12 were analysed (Panaud et al., 1996). The nucleotide sequences of the primer pairs were 5′-GAAACAGAGGCACATTTCATTG-3′ and 5′-ATCTTGTCCCTGCAGGTCAT-3′ (Panaud et al., 1996). PCR amplification was performed in 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 0.5 or 1 unit of Taq polymerase (TAKARA), 4 nmol dNTP, 10 pmol primer, and 10 ng of genomic DNA per 20 μL using a Thermal Cycler 9600 (Perkin-Elmer). Thirty-five PCR cycles, each consisting of 30 s denaturation at 94°C, 30 s annealing at 55°C, and 1 min polymerization at 72°C, were performed. For analysis with GeneScan (ABI), 0.5 μL of fluorescent dUTP [2 μL for (TAMRA)dUTP) was added to the PCR reaction mixtures. After PCR, unincorporated fluorescent dUTPs were removed with SUPRECTM-02 (TAKARA).

Nucleotide sequence and diversity of amplified DNA fragments

Amplified DNA fragments were subcloned into pCRTMII using a TA cloning kit (Invitrogen) and sequenced using an ABI 373S (Applied Biosystems Inc.). DNA sequences were analysed using GENETYX-MAC software (Software Development, Tokyo).

Lengths of alleles were determined by ethidium bromide staining after electrophoresis on 3% MetaPhor Agarose gels (FMC). Exact allele lengths were also determined by GeneScan (ABI) after fractionation in 6% denatured polyacrylamide gels using a 373A DNA sequencer (ABI).

Gene diversity (GD) was calculated as follows:

where xi indicates the population frequency for a marker and the summation extends over m patterns (Nei, 1973).

Results

Diversity of twin microsatellites of RM20 loci in A genome species of the genus Oryza

RM20A and RM20B are microsatellites that are located on the highly conserved duplicated regions of chromosomes 11 and 12, respectively (Nagamura et al., 1995; Panaud et al., 1996). Because both microsatellites are amplified by a single primer set (Panaud et al., 1996) the RM20A and RM20B loci could be examined in 73 lines belonging to the A genome species O. sativa, O. glaberrima, O. rufipogon, O. brathii and O. longistaminata, and to the F genome species O. brachyantha. Two different lengths of amplicons were amplified by a single primer pair in all but three of the lines. The longer amplicon corresponded to RM20A and was mapped on chromosome 12, whereas the shorter one was mapped on chromosome 11 (Akagi et al., 1996; Panaud et al., 1996). One line of O. longistaminata and one line of O. brachyantha gave a single amplicon and no amplicon was observed in O. meyeriana (Fig. 1). Amplification at some microsatellite loci also failed in several wild rice species tested by Panaud et al. (1996), presumably because the primer sequences were not conserved.

Fig. 1
figure 1

Discrimination of seven alleles detected in RM20 among 59 Japanese cultivars of rice. The PCR products of RM20 were electrophoresed on 3% MetaPhor agarose gels (FMC) and then stained with ethidium bromide. DNA size-marker lanes contain 100 bp per 500 bp DNA ladders.

Overall, 19 and nine alleles (including O. rufipogon, O. sativa ssp. indica and ssp. japonica) were detected at the RM20A and RM20B loci, respectively (Table 1). In both indica and japonica subspecies, twice as many alleles were found at the RM20A locus as at the RM20B locus (Table 1). A similar difference was not observed in O. rufipogon but this was based on a smaller sample (Table 1).

Table 1 Gene diversity for twin microsatellite loci in Oryza rufipogon and two subspecies of O. sativa

Allele lengths at the RM20A locus had a similar distribution in japonica and indica accessions (Fig. 2).. In contrast, the japonica and indica subspecies showed different frequencies of allele length at the RM20B locus (Fig. 2) Two species from Africa, O. glaberrima and O. brathii, had the shortest allele length (Fig. 2). The varieties were classified into indica and japonica subspecies according to an isozyme analysis (Morishima & Oka, 1981). Introgression of the genes between these two subspecies may indicate that japonica and indica had longer and shorter alleles at the RM20B locus, respectively (Second, 1982).

Fig. 2
figure 2

Distribution of allele lengths among Oryza sativa, O. rufipogon, O. glaberrima and O. brathii at the RM20B locus and the RM20A locus. Oryza sativa was classified into japonica and indica subspecies based on an isozyme analysis (Morishima & Oka, 1981).

Highly conserved nucleotide sequences between the twin microsatellites

The nucleotide sequences of RM20A and RM20B in several lines were determined to identify the molecular basis of the difference in allelic diversity at the RM20A and RM20B loci. Amplicons from all of the lines, except O. longistaminata and O. brachyantha, contained a simple sequence trinucleotide repeat, (TAA)n (Fig. 3a). The nucleotide sequences of the flanking regions were highly conserved between RM20A and RM20B (Fig. 4). Characteristic sequences for each locus were found at the 5§-flanking region of the microsatellite motif; (T)6GCAAA(TAA)n for RM20A and (T)3A(TAA)n for RM20B (Figs 3 and 4).

Fig. 3
figure 3

Comparison of the nucleotide sequences of the microsatellites in the (a) RM20A and (b) RM20B loci, amplified from several lines of Oryza sativa, O. rufipogon, O. glaberrima and O. brathii. Oryza sativa (I) and O. sativa (J) indicate indica and japonica subspecies, respectively. Accession numbers of the lines are also indicated. Conserved nucleotides among these sequences are shown by dots. Underlining indicates the 8-bp repetitive sequences flanked by the microsatellite. Arrows indicate primer sequences for detection of a three-base deletion at the RM20B locus. Boxes A, B and C show special features of nucleotide sequences for japonica subspecies, the African species O. glaberrima and O. brathii, and some japonica subspecies, respectively.

Fig. 4
figure 4

Nucleotide sequences of RM20-related amplicons from Oryza longistaminata, comparing RM20A and RM20B molecules. Conserved nucleotides among these sequences are shown by dots. Different nucleotide sequences between RM20A and RM20B are indicated by boxes. Sequence absent in the RM20H molecule is underlined.

The numbers of simple sequence repeats at the RM20A and RM20B loci ranged from 13–36 and 7–15, respectively (Fig. 3a, b). This indicated that the allele length at both of the RM20 loci in Fig. 2 was determined solely by the number of TAA repeats. Another microsatellite motif, (CT)n, in RM20A was not found in the japonica subspecies (Fig. 3a, box A). In some varieties of japonica, a three-nucleotide deletion was also found in the region 5§ of RM20B microsatellite (Fig. 3b, box C). Distribution of this deletion among the A genome species was surveyed by PCR using the primers arrowed in Fig. 3(b). This deletion was found only in the japonica subspecies by PCR (data not shown), indicating that the deletion had occurred after the divergence of japonica. A single-base insertion at the RM20B locus was only found in O. glaberrima and O. brathii (Fig. 3b, box B). This agrees well with the notion that O. brathii is the ancestor of the cultivated species O. glaberrima (Morishima et al., 1963; Second, 1982).

The wild species O. longistaminata contains poly(A) instead of poly(TAA)

Different amplification patterns with the primer pair for RM20 were observed in two lines of O. longistaminata. Two different sizes of amplicon were observed in line Af-9, whereas only an amplicon with a lower molecular weight was observed in line Af-14 (Fig. 1). The nucleotide sequences of these higher (RM20Hl) and lower (RM20Ll) molecular weight amplicons from line Af-9 were determined (Fig. 4). In the nucleotide sequences of RM20Ll, a short stretch of poly(A) was found in place of poly(TAA) (Fig. 4), whereas RM20Hl contained a poly(TAA) sequence (Fig. 4). However, the nucleotide sequences on both sides of this microsatellite motif were diverged from those of either RM20A or RM20B. In the 3§-flanking region of the poly(TAA) of RM20Hl, the sequence GTATGAGAA was deleted (Fig. 4, underline). The twin microsatellites RM20A and RM20B can be distinguished from each other by the number of repeats of the microsatellite motif and by variation in their flanking sequences. However, it was not possible to conclude that either RM20Ll or RM20Hl corresponded to RM20A or RM20B based on the features of their nucleotide sequences. RM20A and RM20B were previously mapped on rice chromosomes using a cross between O. longistaminata and O. sativa (Panaud et al., 1996). In that report, a shorter band (RM20B) from O. longistaminata was polymorphic and was mapped on chromosome 11 (Panaud et al., 1996). The lower molecular weight band (RM20Ll) amplified here from two lines of O. longistaminata may correspond to RM20B, because the length of RM20Ll was similar to that of RM20B in the original report.

When the RM20A amplicon was used as a probe on Southern blots it detected two positive bands with a smear background in all A genome species but no clear signal in O. brachyantha (Fig. 5a). The bands in the A genome species, including O. longistaminata, are likely to be the RM20A and RM20B loci. Southern blot analysis revealed that line Af-14 of O. longistaminata as well as line Af-9 contained two copies of RM20-related sequence (Fig. 5a) Only one amplicon was observed in line Af-14 (Fig. 1), suggesting the primer sequences were not conserved at one of the loci.

Fig. 5
figure 5

Southern blot analysis of PstI fragments. Panel (a) shows the hybridization patterns with the RM20A probe from Oryza sativa cv. Asomonori. Panel (b) shows the hybridization patterns with the RM20b probe from O. brachyantha.

Nucleotide sequence of the amplicon from the F genome species O. brachyantha

In the wild species O. brachyantha, which has an F genome, only one band (RM20b) was amplified (Fig. 1). The region that contained the microsatellite in other species was absent but the flanking regions were similar to those of RM20 molecules of A genome species (Fig. 6). GAGAAGTT sequences were found in both flanking regions of the microsatellite motif in A genome species (Figs 3 and 6, underlined). Oryza brachyantha contained one of these 8-bp sequences (Fig. 6, underlined).

Fig. 6
figure 6

Homology between nucleotide sequences of RM20b from Oryza brachyantha and RM20LI from O. longistaminata. Repetitive sequences of 8 bp are underlined. Conserved nucleotides among these sequences are shown by dots.

Southern blot analysis showed that the RM20b probe did not hybridize to the bands that hybridized to the RM20A probe in A genome species. The smear hybridization pattern in O. brachyantha indicated that sequences homologous to RM20b are highly repetitive. Therefore, the correspondence of the RM20b molecule in O. brachyantha with the other RM20-related molecules in A genome species is unclear.

Discussion

The putative ancestral sequence of the twin microsatellites RM20A and RM20B, located on duplicate chromosomal segments, was found in O. longistaminata. The trinucleotide repeats of both of the microsatellites are believed to have evolved from a single nucleotide repeat. The process was considered by which these twin microsatellites may have formed during the evolution of the genus Oryza.

Origin and evolution of the twin microsatellites in the genus Oryza

Microsatellites vary in length because of slippage amplification or reduction. However, there is little information available on the origins of microsatellites. A single-base substitution created tetranucleotide repeats and a dinucleotide repeat in the η-globin pseudogene in primates (Messier et al., 1996). In this study, a putative ancestral sequence of the twin microsatellites was found. Poly(TAA) may have been created from poly(A) by a single-base substitution (A to T), as shown in Fig. 7, which describes the simplest routes from a hypothetical ancestral molecule to the present-day molecules.

Fig. 7
figure 7

Evolutionary relationships among RM20-related molecules. The simplest routes from a hypothetical ancestral sequence to the present RM20-related molecules are shown. Two possible routes are shown for the formation of RM20A(I) and RM20A(J).

Relationship between the evolution of the genus Oryza and twin microsatellites

The genus Oryza is divided into four species complexes. All of the A genome species analysed here belong to the O. sativa complex. The evolutionary relationships among species of the genus Oryza have been evaluated by morphological (Morishima & Oka, 1960), isozyme (Second, 1982), chloroplast DNA (Dally & Second, 1990) and nuclear RFLP (Wang et al., 1992) analyses.

The lines analysed in the present study had been collected in Africa (O. longistaminata, O. brathii, O. glaberrima and O. brachyantha) and Asia (O. sativa and O. rufipogon). The nucleotide sequences of RM20 molecules were highly conserved between African and Asian species. The features of these nucleotide sequences suggested that O. brathii had a closer affinity with O. rufipogon than with O. longistaminata, despite the geographical isolation of these wild species.

Figure 8 shows a dendrogram which was constructed based on a nuclear RFLP analysis (Wang et al., 1992). Steps in the evolution of RM20-related molecules are indicated on this dendrogram. Mutation at the original molecule resulted in the intermediate molecule ‘X’. Before or after the formation of the intermediate ‘Y’, the chromosomal segment that contained the source of the microsatellites may have duplicated, resulting in the present structures of chromosomes 11 and 12 in the A genome species. Because RM20Hl and RM20Ll in O. longistaminata, and RM20A and RM20B in other species may have evolved directly from the intermediate molecule ‘Y’ independently, these RM20 molecules may have formed after the divergence of O. longistaminata and other A genome species. This result agrees well with the previous finding that O. longistaminata had initially diverged from other A genome species. RM20A(I) and RM20B(I) evolved before the divergence of O. brathii and O. rufipogon at chromosomes 12 and 11, respectively (Fig. 8). However, the time of the formation of RM20B(J) and RM20A(J) during rice evolution could not be verified by a sequence comparison. There was no evidence regarding whether indica or japonica subspecies developed first in the evolution of RM20A and RM20B. However, allele diversities did not differ between indica and japonica carrying RM20A(I) and RM20A(J), respectively (Fig. 2), suggesting that RM20A(J) may have been created close to RM20A(I). Therefore, RM20A(J) may also have been created before the divergence of O. brathii and O. rufipogon (Fig. 8). These results suggest that three types of cultivated rice, O. glaberrima, O. sativa ssp. indica and O. sativa ssp. japonica, may have evolved in parallel from individual ancestors, although ancestors of the japonica subspecies were not found here by nucleotide sequence analysis. Oryza glaberrima was domesticated directly from O. brathii in Africa independently of O. sativa in Asia (Morishima et al., 1963; Second, 1982). Previous reports have also suggested that the japonica and indica subspecies differentiated before their domestication (Second, 1982; Wang et al., 1992; Chen et al., 1993). Because RM20B(I) may have been formed from RM20B(J) by the duplication of poly(TAA), RM20B(J) may have been formed before the divergence of O. brathii and O. rufipogon. This suggests that an ancestor of japonica subspecies had already differentiated from an ancestor of indica and O. glaberrima before the divergence of O. brathii.

Fig. 8
figure 8

Time of the formation of RM20-related molecules during the evolution of the genus Oryza. The dendrogram is constructed based on an RFLP analysis (Wang et al., 1992). The timing of the formation of RM20B(J) could not be determined here.