Introduction

Nucleotide sequence data integrate information over a long time period (on average, 4N generations, where N is the effective population size), and thus can provide powerful insights into the important factors contributing to the evolutionary dynamics of a species. In any natural population, the amounts and patterns of nucleotide variation are influenced by both biological factors (for example, mutation, recombination and natural selection) and demographic factors (for example, range expansion and colonization, population subdivision and admixture) affecting N.

In recent years, nucleotide variation in nuclear functional genes has been extensively studied in herbaceous plants (Innan et al., 1996; Filatov and Charlesworth, 1999; Kuittinen and Aguage, 2000; Aguade, 2001; Zhang and Ge, 2006; Arunyawat et al., 2007). However, there has been only a very limited number of such published reports about tree species. The amounts and patterns of genetic variation in plants are strongly affected by life history traits and mating systems (Hamrick and Godt, 1996); hence, it will be problematic to generalize the patterns of nucleotide variation in herbaceous plants to woody plants. Thus far, most data of nucleotide variation in trees have been from Pinus (Dvornyk et al., 2002; Garcia-Gil et al., 2003; Neale and Savolainen, 2004; Pot et al., 2005), Cryptomeria (Kado et al., 2003), Chamaecyparis (Kado et al., 2008) and Populus (Ingvarsson, 2005).

The genus Quercus comprises 350–500 species distributed throughout the northern hemisphere (Kubitzki, 1993; Nixon, 1993). They are conspicuous members of temperate deciduous forests of North America, Europe, Asia, as well as the evergreen Mediterranean maquis. The members of this genus are attractive for research, primarily for developing concepts of the species, speciation and evolution (Van, 1976; Rieseberg and Wendel, 1993). Nonetheless, remarkably little is known about levels of nucleotide polymorphism in the functional genes of this genus. Hence, this article provides the first characterization of levels of nucleotide polymorphism in a species of Quercus, Quercus crispula Blume.

Q. crispula Blume (Fagaceae) is a common tree throughout the cool-temperate deciduous forests of southern Sakhalin, the Kuril Islands, Japan, and Korea (Watanabe, 1994). The populations are dense and continuous in northern Japan but sparse and commonly isolated in small patchy forests at elevations >800 m in the south (Figure 1). cpDNA data (Kanno et al., 2004; Okaura et al., 2007) are consistent with palynological data (Hibino and Takeuchi, 1998; Miyoshi, 1998; Morita et al., 1998), suggesting a recent colonization history (from 8000 to 12000 years ago) for the northern Japan (approximately more than 35° N) populations. The southern populations are shown to have remained in situ during the last glacial maximum (LGM) and lasted from 20 000 to 18 000 years BP. The former populations are likely to be adaptive to a colder climate and the latter to a warmer climate (Okaura et al., 2007). Our previous cpDNA study also revealed a similar colonization history for another deciduous tree, Fagus crenata (Fagaceae), usually forming a major vegetation class together with Q. crispula (Okaura and Harada, 2002).

Figure 1
figure 1

Approximate geographical distribution of the populations sampled and the current pattern of species distribution in Japan. The name codes of each population (N1–S4) are the same as those given in Supplementary Table S1. The shadings indicate the patterns of the current population distribution and density, based on Horikawa (1972).

The present study addresses the following questions: (1) What is the level of nucleotide diversity in this oak species as compared to pines and other broadleaved species? (2) What are the pattern and extent of linkage disequilibrium (LD) among populations of Q. crispula? (3) Do the patterns of genetic variation differ significantly between the northern and southern Japan populations? (4) Can we find sequence evidence favoring the hypothesis of climatic adaptation in populations of Q. crispula?

Materials and methods

Samples

Fresh leaves were collected from trees belonging to nine populations of Q. crispula distributed throughout Japan (Figure 1; Supplementary Table S1). Leaves collected from another Japanese oak were used as outgroup species (Quercus phillyraeoides, no morphological evidence for the hybridization of this species with Q. crispula). We collected leaves from trees growing at least 30 m apart. Trees clearly distinguishable by morphological characteristics were chosen. The samples were immediately processed for DNA extraction or frozen at −20 °C until use.

Loci

We genotyped 11 loci, all of which are nuclear functional genes (Table 1). Except for MetE, the others are 10 expressed sequence tag (EST) loci extracted from oak EST resources reported in a previous study (Casasoli et al., 2006).

Table 1 Summary of loci studied

DNA extraction and PCR amplification

Total genomic DNA was extracted from fresh or frozen leaf tissues using the cetyltrimethylammonium bromide method. The primers for the amplification of MetE are identical to those specified by Chiang et al. (2002). Those for the amplification of the 10 EST loci are the same as those reported by Casasoli et al. (2006).

Subcloning and sequencing

The three long (>1000 bp) loci, MetE, Cons109 and Cons14, were cloned and sequenced to generate haplotype sequences. The PCR products were cloned into the TOPO vector using a Zero Blunt TOPO PCR Cloning kit for sequencing (Invitrogen). The clones were sequenced using the BigDye terminator sequencing kit (Applied Biosystems, Foster City, CA, USA) on an ABI310 automated sequencer (Applied Biosystems). At least 8–10 different clones of each fragment were sequenced to detect the presence of multiple haplotypes within individuals and to control for Taq polymerase errors. For the other eight short (<600 bp) loci, we used a direct sequencing technique to obtain single nucleotide polymorphism (SNP) data. The frequency of each SNP was scored by direct sequencing of the gene fragments in which it was present. Heterozygous sites in a single individual were visible as double peaks in the chromatograms generated by the automated sequencer. The peaks were confirmed by repeating the experiment twice if necessary.

Data analysis

Sequences were verified manually. All indels (insertion/deletion) were excluded from analyses. Analyses of sequence data of the three long loci, MetE, Cons109 and Cons14, were performed using DnaSP version 4.10.8 (Rozas et al., 2003). For the eight short loci, nucleotide diversity (Ï€) was estimated by dividing the average pairwise sequence difference (k) by the number of sites.

The alignments of all the sequences of MetE, Cons109 and Cons14 were used to perform analysis of recombination with DnaSP. The minimum number of recombination events (RM) that have occurred along the sequence was estimated following the four-gamate test (Hudson and Kaplan, 1985). The two indices of LD, D′ and r2, were estimated between pairs of polymorphic sites for each locus with DnaSP. The statistical significance of each pairwise test of LD was evaluated using the χ2-test. All pairwise analyses are not fully independent because of LD itself; thus, the proportion of significant (P<0.05) pairwise tests was compared for pairs of sites separated by different physical distances (>400 or <400 bp). Comparison of these ratios over the two separate distance ranges was carried out using Fisher's exact test (two-tailed). Although the nonlinear regression analysis (Remington et al., 2001) was useful as a representation of the overall LD rate (Ingvarsson, 2005; Zhu et al., 2007), it yields a least-squares estimate of the population recombination rate per base pair; this estimate may be imprecise and unrealistic because of several factors, for example, the nonindependence between linked sites and the nonequilibrium populations.

Tajima's D (Tajima, 1989a) statistic was used to test for deviations from neutrality of the three loci, MetE, Cons109 and Cons14. This test measures skews in the frequency spectrum; a negative D suggests an excess of low-frequency polymorphisms and a positive D indicates an excess of intermediate-frequency polymorphisms. Tajima's test is conservative for testing departures from neutral equilibrium conditions, particularly under the assumption of no recombination. In addition, the test statistics D and F proposed by Fu and Li (1993) were computed for the three loci. All these tests assume random mating and random sampling, and DnaSP was used to carry out the tests. We did not apply these tests for the eight short loci because of the small number of segregating sites per locus. Instead, we performed the multilocus Hudson–Kreitman–Aguade (HKA) test (Hudson et al., 1987; http://lifesci.rutgers.edu/~heylab/) to assess the neutral prediction of the ratio of polymorphism to divergence across the eight EST loci. Q. phillyraeoides homologous sequences were used as outgroup for this test.

Inbreeding indices (FIS) and their statistical significance (with 1000 randomizations) were estimated using FSTAT ver. 2.9.3.2 (Goudet, 2001). The SNP data of all the 11 loci were used in this analysis for the three northernmost and three southernmost populations; SNP data of the three loci, MetE, Cons109 and Cons14, were used for estimating FIS for the three central populations (N4-S1). We estimated the standard FST statistic for MetE, Cons109 and Cons14 using the method of Hudson et al. (1992); the probability of the statistical test of genetic differentiation was estimated using the method of Hudson (2000) (Snn statistic-based probabilities, evaluated by 10 000 permutations). For the eight short loci, we used their SNP data to estimate FST, 99% confidence intervals of FST, and its statistical significance with 1000 randomizations with FSTAT.

Results

We obtained the sequences of 11 loci from three northernmost (N1–N3) and three southernmost (S2–S4) populations; the sequences of MetE, Cons109 and Cons14 were additionally obtained from three populations (N4, N5 and S1) located approximately in the center of Japan (Figure 1). The length of the aligned sequence for each locus varied between 237 and 1385 bp, with a total of 6477 bp in length. The number of sequences sampled for each locus (n) in each population is shown in Tables 2 and 3. Only two variants were detected for all the polymorphic sites in this study. A total of 261 SNPs were detected, and thus an average of 1 SNP every ∼25 bp was found in our sample set. High haplotype fractions (h/n) were found in every population at all loci (Table 2).

Table 2 Summary of nucleotide variation for MetE, Cons109 and Cons14
Table 3 Summary of FST and tests of genetic differentiation

Levels of within-population inbreeding and between-population differentiation

The FIS values did not vary significantly among different populations, ranging from −0.081 (N5) to 0.044 (S1). None of the populations examined had FIS values significantly deviating from 0. The FST values are summarized in Table 3. There was slight variation across loci in the levels of genetic differentiation. Although all loci showed significant genetic differentiation across populations, the level of population differentiation was modest. The averages of FST across loci were low both in the pool and in the three northernmost and three southernmost populations (Table 3). Previous microsatellite studies (Ohsawa et al., 2007) also revealed very low levels of between-population differentiation and within-population inbreeding for all populations examined.

Nucleotide variation

Standard statistics of nucleotide variation for the three loci, MetE, Cons109 and Cons14, in each population and the pool (combined sequence data) are presented in Table 2. A total of 510 sequences was sampled for the three loci in nine populations, thus an average of ∼19 sequences per locus per population were used to estimate nucleotide polymorphism (θW) as well as nucleotide diversity at total sites (πtot) and at silent sites (πsil). The levels of nucleotide diversity are heterogeneous among loci, with MetE being the least diverse locus (πtot=0.00667, πsil=0.00711 in the pool) and Cons109 being the highest (πtot=0.00721, πsil=0.00768 in the pool). When the sequence data of all three loci are combined, πtot=0.00693, πsil=0.00780 and θW=0.00961 in the pool. As shown in Table 4, π was also estimated from the eight short loci based on their SNP data. Because of the short sequence lengths, we combined the data of all the eight loci to estimate π at total sites (2529 bp). A total of 20 sequences per locus per population were used for the estimation. In the pool of six populations, πtot was estimated to be 0.00598 (±0.00100; Table 4). This value is quite similar to that estimated from the combined sequence data of the three long loci mentioned above.

Table 4 Summary of nucleotide variation in the eight short EST loci

Being more recently colonized, the northern Japan populations are expected to have a lower level of genetic diversity than the southern populations. However, our calculations showed that nucleotide diversity in the three northernmost populations (N1–N3) was not lower than that in the three southernmost populations (S2–S4) for all three loci, MetE, Cons109 and Cons14 as well as the short loci (Table 4); it is also clearly seen in Tables 2 and 3 that all genetic diversity parameters, that is, πtot, πsil and θW are generally not lower in the northern populations than in the southern populations.

Recombination and linkage disequilibrium

In this article, we calculated the minimum number of recombination events, RM, per informative site estimated from the combined data of all the three loci, MetE, Cons109 and Cons14 (3500 bp), to compare the level of recombination among populations. As shown in Table 5, RM per informative site for the three loci was 0.32 in the pool, and ranged from 0.04 to 0.20 among populations. The levels of recombination appeared higher in the four northernmost populations (N1–N4) than in the remaining populations, despite that the statistical test supported a significant or marginally significant difference with N5, S1 and S2 populations only (P<0.10, two-tailed Fisher's exact test).

Table 5 Summary of levels of recombination

In theory, when recombination drives allelic diversity, LD between polymorphic sites is expected to decline with increasing distance along the chromosome. To understand this relationship, we compared the number of significant comparisons among sites >400 bp apart and <400 bp apart for each locus in the pool. The results of all three loci showed that both the average values of r2 and ∣D′∣ were significantly larger among sites <400 bp apart, and the percentage of significant pairwise comparison was also significantly higher (P<0.05) in this group (Figure 2; Supplementary Tables S2, S3 and S4), suggesting that on average, LD significantly declined among sites >400 bp apart in species-wide samples. However, LD did not decline significantly among sites >400 bp apart within populations, except for the three northernmost populations (N1–N3; Figure 2; Supplementary Tables S2, S3 and S4). The patterns of LD decay were similar when the low-frequency alleles (<10%) were included or excluded from the analysis.

Figure 2
figure 2

LD analysis of the species-wide sample (Pool) and different local populations (N1–S4) at the MetE locus. Average values estimated among sites >400 bp apart and <400 bp apart are indicated by the first and second columns, respectively, for the pool and for each population separately; (a) comparisons of ∣D′∣, (b) comparisons of r2 and (c) comparisons of percentage of significant pairwise tests; when the difference between the two groups is statistically significant (P<0.05), an asterisk (*) is labeled.

Neutrality tests

To examine the fit of nucleotide polymorphism data to the neutral equilibrium model, we performed the tests of Tajima's D (Tajima, 1989a) and Fu and Li's D and F (Fu and Li, 1993) for each locus in each population separately and in the pool of nine populations. No significant Tajima's D and Fu and Li's D and F were observed at any of the three loci except for MetE in the pool (Fu and Li's D and F were significantly negative) and all the three loci in population S2 (Tajima's D and Fu and Li's D and F were all significantly positive; Table 2). It is interesting to note that Tajima's D is negative at all the three loci in the pool but positive in three out of four populations in southern Japan (S1, S2 and S4). The multilocus HKA test was used to assess departures from the neutral model for the eight short EST loci. We observed no evidence for any significant deviations in any population and in the pool (P>0.05; Table 4).

Discussion

Nucleotide variation

The occurrence frequency of SNP in our samples of Q. crispula is 1 every ∼25 bp, which is comparable to that of other plant species previously reported. For example, 1 SNP every ∼50 bp was detected when the average of 1095 randomly selected gene fragments in maize was considered (Yamasaki et al., 2005); a frequency of 1 every 23 bp was scored when the average of 10 unlinked loci in rice was taken (Zhu et al., 2007); a frequency of 1 every 60 bp was obtained when the average of 5 loci in European aspen was considered (Ingvarsson, 2005). This is one of the first studies describing the level of nucleotide polymorphism in oak species. The LD mapping approach requires abundant SNPs in the target species. In this study, SNPs were detected in all gene fragments, even in those with only a couple of hundred base pairs in length. Thus, screening for adaptive variation in oaks can be efficient with short sequences by the genome-wide scan approach as performed in maize (Yamasaki et al., 2005).

The present study is an additional contribution to increase knowledge about nucleotide diversity in forest tree species. We use both πtol and πsil to compare the levels of nucleotide diversity among species for the following discussion because πsil has not yet been reported for several tree species in previous studies (Supplementary Table S5). Nucleotide diversity is generally low in conifers such as Pinus, Cryptomeria and Chamaecyparis but relatively high in broadleaved trees such as Populus and Quercus. The nucleotide diversity of Q. crispula is generally higher than that of conifers, as high as that of a European oak, but lower than that of the European aspen (Supplementary Table S5).

Besides the mating system and life history traits, demographic history has a major impact on the level of within-population genetic diversity (Hamrick and Godt, 1996; Charlesworth, 2003). The results of this study imply that the northernmost populations do not harbor lower nucleotide diversity than the southernmost populations. This is interesting given the recent colonization history of the northern Japan populations. Theoretical models showed that the recently colonized populations are generally expected to show a decline in genetic diversity due to genetic bottlenecks and founder events (Slatekin, 1977; Pannell and Charlesworth, 1999; Pannell and Dorken, 2006). Possible alternative hypotheses will be discussed below together with population recombination and LD decay.

Mating system and population structures

Given the differences in the current patterns of population distribution and the density of the forests in northern and southern Japan, the level of inbreeding may be expected to be higher in the southern populations at some degree. Nevertheless, inbreeding indices were not significantly different from zero in all the populations sampled.

The average FST was estimated to be 0.053 in this study; the equivalent measurement of FST was 0.013 as estimated from microsatellite data in a previous study (Ohsawa et al., 2007). Even though these two FST estimates were substantially different, which was probably due to sampling strategies (latitudinal gradient along Japan vs altitudinal gradient in a narrow latitudinal range) and the genetic nature (biallelic vs multiple allelic polymorphism and evolutionary mechanisms) of the markers, both studies consistently revealed a modest level of genetic differentiation among populations. Q. crispula is an obligate outcrossing and wind-pollinated long-lived forest tree and has seeds that may be dispersed over a long distance by birds and rodents (Watanabe, 1994). Hence, the low levels of population differentiation observed among populations of this species are consistent with a general observation for outbreeding and wind-pollinated species (Hamrick and Godt, 1990). Nonetheless, low FST was unexpectedly observed among the recently colonized populations (northernmost populations). Several theoretical studies showed that colonization frequently leads to an excess of differentiation among populations compared with the equilibrium situation (Austerlitz et al., 1997; Le Corre and Kremer, 1998); however, with time, genetic differentiation is expected to decrease because of migration between extant populations (Pannell and Dorken, 2006).

Tajima's D-values of all the three loci were negative in the pool but positive or even significantly positive in several southern Japan populations. This suggests an excess of low-frequency polymorphisms in the species-wide samples and an excess of intermediate-frequency polymorphisms in the populations from southern Japan. The excess of low-frequency or intermediate-frequency polymorphisms is consistent across three loci, thus natural selection seems not to be the evolutionary force causing these skews in the frequency spectrum. The excess of intermediate-frequency polymorphisms in the populations from the south appears to be a genetic consequence of a recent reduction in population size (Tajima, 1989b), because of the preferential loss of low-frequency polymorphisms. The observation of more negative Tajima's D in the pooled sample as opposed to the within-populations of Q. crispula in this study was similar to that in Populus (Ingvarsson, 2005) and wild tomatoes (Arunyawat et al., 2007). This excess of low-frequency polymorphism can be explained by species range expansion in Japan, probably the recent northern colonization.

Furthermore, palynological studies have suggested that oak forests covered a broad area in southern Japan during LGM and have just recently reached the northern areas. In southern Japan, the frequency of oak pollen rapidly increased 12 000 years ago and then began to decrease from 8500 years ago (Miyoshi, 1998; Takahara, 1998). However, in a population from southern Japan (S3), we did not observe the excess of intermediate-frequency polymorphisms; this may suggest that either no such significant reduction in population size occurred in this population or singleton mutations have been reaccumulated in this population owing to gene flow from other populations within a time period of approximately 8500 years.

Recombination and linkage disequilibrium

We showed that the level of recombination was relatively high in Q. crispula as RM per informative site was 0.32 for the species-wide samples. The values of RM per informative site at seven loci in Arabidopsis thaliana were lower and ranged from 0.08 to 0.15 (Kuittinen and Aguage, 2000); however, these values were still much higher than those in Cryptomeria japonica, which ranged from 0.0000 to 0.0526 (Kado et al., 2003); RM per informative site in the Pal1 locus of Pinus sylvestris was 0.25 (Dvornyk et al., 2002). Note that A. thaliana is mostly selfing and thus seems to have very low effective recombination rates (Gaut and Long, 2003). The recombination rate in C. japonica was about one-thirtieth of that in A. thaliana on average (Kado et al., 2003). P. sylvestris is among the tree species with the widest-known geographic distributions in the world, whereas Q. crispula is restricted to some East Asian areas; however, the levels of both nucleotide diversity and population recombination were generally higher in this oak than the conifer. This is attributable to the higher mutation and recombination rates in oaks than in conifers. Alternatively, past and/or ongoing hybridization among closely related oak species, such as between Q. crispula and Q. dentata (Ishida et al., 2003) and between Q. crispula and Q. serrata (Kanazashi et al., 1997) in Japan may be one factor maintaining the high levels of nucleotide variation and recombination in oaks.

This study together with a previous study (Ingvarsson, 2005) showed that broadleaved trees such as oaks and aspens may show generally higher levels of species-wide nucleotide diversity and recombination than conifers. Theoretical and empirical studies have shown that the mating system primarily affects diversity at the population level rather than at the species level (Savolainen et al., 2000; Ingvarsson, 2002; Charlesworth, 2003; Wright et al., 2003). Oaks, aspens and conifers are all mostly outcrossing long-lived trees. Thus, differences in life history traits might be an important factor responsible for such different levels of nucleotide diversity and recombination between broadleaved and coniferous species.

Perhaps, one of our most interesting findings is that the three northernmost populations harbor a relatively large amount of nucleotide diversity, higher levels of recombination and faster LD decay than the southern populations in Japan. This result is contradictory to theoretical expectations given the recent colonization history of the northern Japan populations. Several possibilities should be considered to explain this paradox. The first is that effective population size is larger in the recently colonized populations than in the southern populations due to a recent population expansion that occurred in the former and/or a recent population bottleneck in the latter. Statistical tests and palynological data appear to favor the hypothesis of a recent population bottleneck in southern Japan populations.

A second possibility could be natural selection. Purifying selection reduces genetic diversity and the southern populations are more intensively selected than the northern populations; the LD decay was shown to be significantly slower in the former than in the latter, thereby causing hitchhiking and background selection more effective (Kaplan et al., 1989; Begun and Aquadro, 1992; Charlesworth et al., 1993). However, the possibility of selection is not supported by statistical tests in this study. It should be noted that, because of the generally low statistical power of most tests (Wright and Gaut, 2005), the effects of selection cannot be excluded entirely for all loci.

The relatively high levels of nucleotide polymorphism and recombination in the recently colonized populations in northern Japan may also be explained by several possibilities. Firstly, gene flow subsequent to colonization might be sufficiently strong to increase genetic diversity in the populations when they reach demographic equilibrium (Pannell and Dorken, 2006). Secondly, the long lifespan and juvenile phase of trees together with long-distance seed dispersal were shown to strongly attenuate founder effects during colonization (Austerlitz et al., 1997; Austerlitz and Garnier-Gere, 2003). Lastly, multiple colonization events causing an admixture of different lineages may overcome bottlenecks in recently colonized populations, particularly when propagules have long residence time or retain high genetic diversity as do the acorns of oaks (Lavergne and Molofsky, 2007).

Although further studies are needed to test each hypothesis mentioned above, the current landscape of genetic diversity, recombination and LD in the natural populations of Q. crispula characterized in this study may provide primarily information for conservation genetics and LD mapping studies for this species and facilitate molecular evolution studies in oaks.