Introduction

To date, genetic diversity in tropical trees has only been investigated for a limited number of species (Loveless, 1992, 2002; Caron et al, 2004; Lowe et al, 2005; Ward et al, 2005). The scarcity of studies most likely originates from the lack of molecular tools and genetic markers applicable to a wide range of species. Development of universal markers for gene diversity studies remains, at least for the time being, confined to the chloroplast genomes (Demesure et al, 1995; Grivet et al, 2001). To date, no universal marker system is available for codominant nuclear loci. Such markers need to be developed separately for each species, and can be extremely costly in resources and time (Squirrell et al, 2003). Transferability of microsatellites between species has been reported in several temperate woody taxa (eg Eucalyptus, Byrne et al, 1996; Conifers, Echt et al, 1999; Vitis, Di Gaspero et al, 2000; Fagaceae, Barreneche et al, 2004), and more recently for some tropical species (Meliaceae, White and Powell, 1997; Caryocariaceae, Collevatti et al, 1999). However, all these examples showed that SSR transferability remains limited to phylogenetically related species.

The technical limitations of interspecific transfer of codominant markers has lead to the use of random amplification techniques for assessing genetic diversity, such as RAPDs (random amplified polymorphic DNA, Williams et al, 1990) or AFLPs (amplified fragment length polymorphism, Vos et al, 1995). The RAPD technique has now become less popular due to problems with reproducibility; nevertheless, both random priming techniques can produce an unlimited number of markers, and make them attractive for species where no other codominant markers or DNA sequence data are available.

Recently, two meta-analysis showed that diversity surveys made with dominant markers provide comparable data to surveys undertaken with codominant markers (Nybom and Bartish, 2000; Nybom, 2004). Both class of markers suggest that long-lived, outcrossing, late succession taxa retain most of their genetic variability within populations. One of the major advantages of using random primers to amplify genomic products is that large number of DNA sites can be prospected for polymorphism (Cavers et al, 2005). As shown by simulation studies, the large number and wide distribution of AFLP markers throughout the genome compensates for the poor genetic information content at each locus (Mariette et al, 2002a, 2002b). However, the major drawback to using dominant markers for assessing diversity in species is that the estimation of genetic diversity depends on F, the fixation index of the population, and so the use of dominant markers can be limited when F is unknown, as is usually the case for tropical trees.

Gene diversity studies based on dominant markers compiled in the meta-analysis of Nybom (2004), either assumed that populations were in Hardy–Weinberg equilibrium (F=0) or used associated estimates of F from codominant markers. Much attention was then placed on procedures for estimating allelic frequency and related sampling strategy, but assuming that the F value of the population was known (Lynch and Milligan, 1994). Refined estimation procedures using Bayesian methods have been proposed to reduce the bias in estimating allelic frequencies (Zhivotovsky, 1999). The different estimation procedures produce very similar results for outcrossing species in most cases (Krauss, 2000). However, these papers do little to address the problem of analysing dominant data directly when no F value is available.

The objective of this paper is to evaluate the sensitivity of the diversity estimation procedures to variation in F. As a corollary, we intend to provide empirical recommendations for surveys of genetic diversity in species where no information on F values is available, as is commonly the case for tropical tree species.

Genetic diversity for a diallelic dominant locus

Consider a dominant marker. M and m are, respectively, the marker and the null allele at the considered locus. The genotypes MM and Mm cannot be distinguished, which explains that the marker is considered as dominant in relation to the null allele. Let p be the frequency of the marker allele and q be the frequency of the null allele (q=1−p). Let P be the frequency of the MM genotypes, PMm be the frequency of the Mm genotypes and Q be the frequency of the mm genotypes.

The frequencies of the three genotypic classes in a population are as follows:

where F is the fixation index of Wright, or Hardy–Weinberg deviation.

Based on equation (1), the frequency of the null allele (m) can be calculated as follows:

Hence Nei's genetic diversity can be calculated in the case of a biallelic as follows (see also Caron et al, 2004):

In most tropical species F is unknown. Hence, we considered two extreme cases of F encompassing the range of possible values (F=0 and 1). F=0 corresponds to the case where the population is in Hardy–Weinberg equilibrium, and is generally applicable for outcrossing species. Nei's genetic diversity is then (HHW):

F=1 corresponds to a population where there is no heterozygote (see equation (1)), for example, a fully selfed species. In this case, the frequency of bands is considered as the frequency of the M allele, and all phenotypes observed are homozygous genotypes. As seen in equation (1) if F=1, then Q=q and P=p. As a result, the diversity calculated in this case has also been called the phenotypic diversity (HPH) (Mariette et al, 2002a, 2002b)

It is worthwhile noting that H (equation (3)) varies differently as a function of F depending on the Q values (Figure 1): when Q is ≤0.25, H is a monotonous decreasing function of F, meaning that HHW and HPH are, respectively, the maximum and minimum value attainable of H. When 0.25<Q<0.5, H is a monotonous function of F with a local maximum that does not correspond to HHW or HPH. For example when Q=0.4, HHW=0.465 and HPH=0.48 and the maximum of H (0.5) is obtained when F=0.6. When Q is ≥0.5, H is a monotonous increasing function of F, meaning that HHW and HPH are, respectively, the minimum and maximum value attainable of H. Hence, the use of HHW or HPH to calculate H can be misleading. The bias in measuring H with HHW or HPH can be calculated by

Figure 1
figure 1

Variation in genetic diversity (H) as a function of the frequency of null homozygotes (Q) and the fixation coefficient (F).

The estimation of H might be biased in different ways (Figure 2). The importance of the biases is clearly dependent on the value of Q. Regardless of the F value, HPH will underestimate H when Q is ≤0.38, and overestimate H when Q is ≥0.5; HHW will overestimate H when Q is ≤0.25, and underestimate H when Q is >0.38. However, at intermediate values of Q (0.38–0.5) and at large values of Q (>0.90), the biases are reduced whatever the method of estimation is used (HHW or HPH).

Figure 2
figure 2

Variation in bias of diversity measures (H) as a function of F. βHW is the bias resulting from the assumption of Hardy–Weinberg equilibrium in the population (bold lines) . βPH is the bias resulting from the assumption that frequencies of phenotypes are equal to allele frequencies (thin curves) .

Genetic diversity for multiple dominant loci

At a single locus, we have shown that HHW and HPH produce biased estimates of H, and that the level of bias depends on the frequency of null allele homozygotes in the population (Q). As genetic diversity is calculated as a mean over all loci, the bias of a multilocus estimate of diversity will strongly depend on the distribution of the Q values of the different AFLP fragments. As an example, consider a preferentially outcrossing species with F value of 0.20, diversity measured by HHW is overestimated for AFLP fragments, which are present at a frequency of 0.60, but underestimated for AFLP fragments with a frequency of 0.2. Hence, the overall diversity over a large number of loci depends on the frequency profile of the AFLP fragments, as biases over all loci may either cumulate or compensate depending on their Q value. We considered here different cases of Q frequency profiles that are likely to encompass the different experimental situations: U-shaped frequency profile of Q (most AFLP fragments showing either low or high frequencies), inverse U-shaped frequency profile (most fragments are at intermediate frequencies), J-shaped frequency profile (excess of fragments at high frequencies), inverse J-shaped frequency profile (excess of AFLP fragments at low frequencies). These frequency profiles are generated by sampling 200 AFLP loci from a beta distribution with parameters a and b (Figure 3). For each profile, 10 repetitions are obtained by sampling with replacement in the beta distribution. And for each repetition, H is calculated according to equation (3) by varying F between −0.2 and 1 (Figure 3).

Figure 3
figure 3

Multilocus measurement of diversity as a function of the fixation index F. Distribution of the frequency of null homozygote (Q) were generated according to a beta distribution with parameters a and b (a=0.7 and b=0.7 for the U-shaped frequency profile; a=5 and b=5 for the inverse U-shaped profile; a=1 and b=0.6 for the J-shaped profile; a=0.3 and b=0.7 for the inverse J-shaped profile). In all, 10 repetitions were used for each frequency profile and the vertical bars represent the range of variation of H values among the 10 repetitions.

Overall, the multilocus measures of diversity are much less sensitive to the variation in F values than the monolocus measures (comparison of Figures 3 and 1). For the U- and inverse J-shaped frequency profiles, the mean values of diversity do not vary as a function of F. There is a slight increase of H as a function of F in the case of the inverse U-shaped profile, and a slight decrease for the J-shaped profile. In the two latter cases, the diversity measures remain unchanged over a wide range of F values. For example, for F varying between −0.2 and 0.2 (corresponding to a preferentially out-crossed species), or for F varying between 0.8 to 1 (corresponding to a preferentially selfed species), changes of H remain extremely low. Variance due to the stochastic sampling of loci is highest for the J-shaped profiles, corresponding to the case when most of the AFLP fragments exhibit low Q values. In such cases, the estimation of allele frequencies can also be strongly biased and their sampling variance inflated (Lynch and Milligan, 1994).

We also explored other theoretical cases for each of the four categories of frequency profiles (U, inverse U, J and inverse J). The results shown in Figure 3 are consistent across most of the cases investigated (data not shown), with a few exceptions. For the J- and U-shaped profiles, when the profiles are characterized by an extremely large proportion of fragments with high frequencies (eg a=b=0.2 for the U shaped, a=2 and b=0.2 for the J shaped), then H decreases with increasing values of F. Similarly for the inverse J profiles, H increases as a function of F, when the proportion of fragments present in high frequency increases (eg a=0.2 and b=2).

Case studies: genetic diversity of neotropical trees

Following the same strategy as in theoretical cases, we estimated genetic diversity for 10 tree species distributed throughout Central and South America with contrasting distributional ranges (from local to continental). For seven out of the 10 species, surveys of AFLP diversity were available for several populations (Table 1). In this study, we bulked all material coming from different populations into a single population (species level) for which diversity was estimated. This is because our investigations are mainly concerned with the impact of the frequency profile on the estimation of diversity and not the comparison of diversity among species. As a result, the fixation index (F) to be considered is FIT, cumulating both the within- and among-population deviations of Hardy–Weinberg equilibrium. Diversity (H) was estimated using the Bayesian method of Zhivotovsky (1999), taking as a prior distribution of Q the observed value of Q in the species. As shown by Zhivotovsky (1999), this method is the least biased estimation procedure, especially when Q is small. Computations of H and the corresponding sampling variances were carried out according to Vekemans et al (2002). Diversity was estimated by considering successive values of F ranging between −0.3 and 1. The frequency profiles of the 10 species (Figure 4) fitted to one of the four theoretical cases considered earlier (ie U, inverse U, J, inverse J; Figure 3), with slight deviations from these general patterns for Anacardium, Cedrela and Voschysia.

Table 1 List of real AFLP data sets of neotropcial tree species subjected to genetic diversity analysis, indicating species, geographic distribution, putative mating system, number of AFLP fragments scored, number of primer enzyme combinations (PECs) used and number of populations and individuals genotyped
Figure 4
figure 4

AFLP frequency profiles (Freq) of null homozygotes (Q) in 10 neotropical species. Values within brackets correspond to the a and b parameters of the corresponding beta distribution. To compare the observed frequency profiles to the theoretical ones (Table 1 and Figure 3), a and b parameters of the beta distribution were calculated according to Zhivotovsky (1999).

The diversity values for the 10 different species confirmed the results obtained with the theoretical AFLP profiles. HHW or HPH were the two extreme values of H, when F varied between 0 and 1 (data not shown). We did not represent the whole range of variation but only the two extreme values (HHW or HPH; Figure 5). As a result, the real value of H is situated within this range. Among the 10 species investigated, three did not exhibit any difference between HHW and HPH. Anacardium, Cedrela, Virola and Swietenia were among those that showed the largest differences between HHW and HPH. The two former species are also among those that showed irregular frequency profiles (see Figure 4). However, most significantly these four species were those with the lowest number of scored AFLP fragments (Table 1). Finally, all species showed similar values for H when F varied between −0.2 and 0.2, which is the most likely range for variation in fixation index for species exhibiting preferential outcrossing (data not shown).

Figure 5
figure 5

Diversity estimates of 10 neotropical species as a function of the fixation index. For each species, HPH (−) and HHW ( × ) were estimated, together with their confidence interval (±1.96σ), represented as vertical bars. E.u., Eugenia uniflora; P.m., Pseudobombax munguba; V.f., Voschysia ferruginea; V.m., Virola michelii; E.g., Eperua grandiflora; C.s., Chrysophyllum sanguinolentum; A.o., Anacardium occidentale; C.o., Cedrela odorata; L.c., Lonchocarpus costaricensis; and S.m., Swietenia macrophylla.

Discussion

The measurement of genetic diversity with dominant markers depends on the frequency of null homozygotes (Q) and the fixation index (F) of the populations. While Q can be assessed directly with random priming molecular marker systems, F is less accessible, unless codominant markers are available. The level of the fixation index in a population depends on the mating system and genetic structure (Wahlund effect; Hartl and Clark, 1989). Estimates of the fixation index in tropical trees originate from two sources: mating system analysis and studies of population structure. Indirect estimates of F (F=S/(2−S)) can be obtained from estimates of selfing rates (S). In a review on 26 tropical trees species, Murawski (1995) showed that selfing rates varied between 0 and 80%, but that a majority of tree species were outcrossed. These results were confirmed in two more recent reviews (Loveless, 2002; Ward et al, 2005), partly overlapping with the previous study. In the review of Loveless (2002), only one species among 30 exhibited a selfing rate higher than 15%, and in the review of Ward et al (2005), mixed mating systems were only found in representatives of a single family. Thus, overall we can expect that the fixation coefficient would be, in most cases, less than 10%. However, these values might be inflated in the presence of a Wahlund effect. Significant variation in F has also been noted between adult and juvenile cohorts (Murawski, 1995) and were interpreted as the result of selection favouring heterozygotes. Hence, values of F tend to decrease with population age (Murawski, 1995), which may actually counterbalance any inflating influence from a Wahlund effect.

We show in this study that the monolocus estimation of gene diversity has the potential to vary strongly with variations in F, but that the multilocus estimate is rather robust to deviations in Hardy–Weinberg equilibrium. The robustness of the estimate is due to a mechanistic effect of compensation between negative and positive biases of H estimates for different AFLP loci exhibiting contrasting frequencies of the null homozygote. Surprisingly, the robustness is maintained across a large spectrum of frequency profiles of Q. Existing population data for Q suggest that frequency profiles are in most cases U shaped (Miyashita et al, 1999; Borowsky, 2001). In our survey of 10 neotropical species, we found in addition J- and inverse J-shaped distribution profiles. The robustness of diversity estimation is strongest when frequencies profiles of Q are balanced (U-shaped frequency profiles) or in the case of an inverse J-shaped distribution.

These results lead to important applied consequences for monitoring genetic diversity in species where little information is available on the genetic structure of natural populations. Surveys of genetic diversity in natural populations of tropical trees should therefore follow a stepwise procedure. First, the monitoring should be based on a large number of loci. The usefulness of AFLP for multilocus assessments of diversity lies in the fact that negative and positive biases of H at different loci average out. However, this mechanistic compensation is only effective when many markers are used. From previous simulations (Mariette et al, 2002a, 2002b) and experimental studies (Caron et al, 2004), a few hundred AFLP markers should ideally be recorded in order to cope with the intragenome heterogeneity of diversity. A similar trend was observed in our survey of 10 tropical species (Figure 5): when less than 250 fragments are recorded, the difference between HPH and HHW increases. When no information on the fixation index is available, we recommend estimating H by HPH and HHW. In all theoretical and experimental cases investigated in this study (Figures 3 and 5), these two estimates represent the extreme values of the range of variation of H, and in most cases, this range was extremely small. However, the procedure should be used more cautiously in the case of J- or inverse U-shaped frequency profiles of Q (Figure 3). When there is indirect information available on the mating system, and when the species is considered to be outcrossing or mixed mating, then HHW would be the diversity measure to choice. However, if there is evidence that the species is selfed, then estimation of H by HPH is recommended.