Introduction

Today the savanna biome occupies a fifth of the earth's land surface, and is characterized by a mixture of grasslands and woodlands (Sankaran et al., 2005). It dates from the late Miocene, about eight million years ago (Cerling et al., 1997; Beerling and Osborne, 2006). During the Quaternary, the Pleistocene epoch (from 2.5 million to 11 000 years ago) underwent successive cooling and warming periods. During the last glacial maximum (LGM), around 20 000 years ago (Clark et al., 2009), 30% of the Earth's surface was covered by ice, the African continent was drier and the Sahara desert was bigger than today (Lioubimtseva et al., 1998). Although many phylogeographic studies have reconstructed the history of holarctic and rain forest species distributions, identifying refugia and routes of postglacial colonization (Bekessy et al., 2002; Petit et al., 2003, 2008; Jaramillo-Correa et al., 2008; Dick, 2010), little attention has been paid to the ecological consequences of these perturbations for savanna ecosystems. Up to now, most studies on this biome have concentrated on animal and plant species in the South American Cerrado, and did not identify a general trend in the impacts of perturbations that occurred during the Quaternary. Confirming paleopalynological studies (Behling, 2002), some savanna species showed evidence of range expansions during the LGM (Bonaccorso et al., 2006), while others showed evidence of fragmentation (Bonaccorso et al., 2006; Ramos et al., 2007, 2009). Cumulating evidence indicates that plant and animal populations in the African rain forest underwent historical genetic contractions and bottlenecks, surviving in putative refugia during the LGM (Dauby et al., 2010; Lowe et al., 2010), and expanded during postglacial periods to colonize wider areas (Maley, 1989, 1996). In contrast, on the basis of data concerning the Pliocene glacial event 2.5 million years ago (Poumot, 1989; Morley and Richards, 1993, Maley (1996) proposes that savanna species underwent similar expansions during the cold and arid periods of the Quaternary (Flenley, 1998). To test these assumptions and to help understand the impact of the LGM on African savanna tree species, we analyzed the range-wide phylogeography of Vitellaria paradoxa C.F. Gaertn (commonly known as the shea tree), an IUCN Red List vulnerable but economically important endemic African savanna tree species exploited in agroforestry systems. Mostly reproducing sexually and mainly pollinated by insects, shea tree fruits are mainly disseminated barochorously and secondarily zoochorously (by birds, monkeys, rodents and humans). Its natural range is a 6000-km-wide belt through sub-Sahelian Africa covering a contrasted ecological environment. V. paradoxa is currently subdivided into two non-overlapping subspecies, paradoxa (from Senegal to Central African Republic) and nilotica (in Sudan and Uganda). A gap of 175 km between the two subspecies is reported because of the different watersheds in the area: the Lake Chad basin is home to paradoxa and the Nile drainage basins are home to nilotica (Hall et al., 1996). However, this distinction is based on slight variations in a few morphological traits (the length and width of leaves and stamina) compiled from different studies (Hall et al., 1996), and no data are available concerning their natural or controlled hybridization. Therefore, nilotica and paradoxa are referred as ‘putative’ subspecies throughout this article. The ‘butter’, extracted from its kernels, is the main source of edible vegetable fat in the region. This oil, exported throughout Africa and used in the food, confectionary, cosmetic and pharmaceutical industries (Lovett and Haq, 2000), displays a clear trend, becoming increasingly harder from east to west due to variable ratios of saturated to unsaturated fatty acids (Davrieux et al., 2010).

An initial evaluation of the genetic diversity of V. paradoxa (Fontaine et al., 2004) suggested a genetic differentiation between eastern and western Africa in shaping the genetic structure of the species by emphasizing the role of the Dahomey Gap, a north to south savanna corridor that appeared in the late Holocene (3000 years ago) in Togo and Benin, disrupting the West African rain forest cover (Maley, 1996). However, this study was performed on a very small sample using RAPD markers, and the results thus need to be considered with caution (Bagley et al., 2001).

To be able to draw general conclusions on the phylogeography of this emblematic African tree, we based our study on a representative sample of the species range using a combination of molecular markers to infer different evolutionary processes. To this end, we used sequence polymorphism of intergenic chloroplast fragments from 58 individuals from 40 populations; and length variation of 3 chloroplast and 12 nuclear microsatellites genotyped on 364 trees from 71 populations. We used climatic models to assess the distribution of the species in three key periods in the Quaternary era: the modern period, the LGM, and the last interglacial (LIG). Using this experimental design, we tried to answer the following questions: (i) what are the impacts of past perturbations on the genetic structure of the species? (ii) Can we propose a reliable theory concerning the origin of current populations, with special focus on the differentiation process between the subspecies paradoxa and nilotica?

Materials and methods

Plant material

In this study, a total of 364 trees were sampled at 71 sites in 14 countries, covering the whole distribution range of the two subspecies: Senegal, Mali, Ivory Coast, Ghana, Burkina-Faso, Togo, Benin, Niger, Nigeria, Cameroon, Chad and Central African Republic for paradoxa, and Sudan and Uganda for nilotica. A subset of samples including 58 individuals from 40 populations was used for the sequence-based chloroplast experiments. Spatial coordinates of the populations are given in Table 1. Total genomic DNA was extracted from dried leaves (50 mg) following the Dolezel method (Ky et al., 2000).

Table 1 Characterization of sampled populations

Molecular markers

Genotyping of chloroplast sequences

We successfully amplified and sequenced 11 chloroplast loci on V. paradoxa, including exclusively non-coding intronic or intergenic regions. Universal primer pairs and annealing temperatures are summarized in Supplementary Appendix S1 (Supplementary information). The polymorphism of these loci was assessed in a subset of 10 individuals belonging to 10 different populations, including nilotica and paradoxa subspecies (Table 1). Among these 11 chloroplast loci, which correspond to 7 kb of chloroplast genome, only 2 loci presented nucleotide polymorphism: trnQ-trnS (Dumolin-Lapegue et al., 1997) and matK-trnK spacers (Johnson and Soltis, 1994). These polymorphic loci were sequenced using the above-mentioned subsample (Table 1). Polymerase chain reaction (PCR) was performed in a total of 20 μl, consisting in 20 ng of total DNA, 10 μl of standard QIAGEN PCR buffer, 100 μM of dNTP, 0.2 μM of each primer and 1 U of Taq DNA polymerase (QIAGEN, Courtaboeuf, France). Amplifications were performed in a MasterCycler Eppendorf thermocycler (Hamburg, Germany) with an initial 4-min denaturation step at 94 °C, followed by 30 cycles (30 s of denaturation at 94 °C, 1 min at 61.7 °C for matK-trnK and at 68 °C for trnQ-trnS, 1 min of extension at 72 °C), then a final extension step at 72 °C for 5 min for matK-trnK or for 10 min for trnQ-trnS. PCR products were purified and sequenced by GATC Biotech (Konstanz, Germany). Using a similar protocol, we amplified and sequenced trnQ-trnS and matK-trnK from five individuals of Baillonella toxisperma Pierre, a relative of the Sapotaceae family, used as outgroup species.

Genotyping of chloroplast SSRs

Among the 10 microsatellite primer pairs tested, 9 showed PCR amplification on our samples: 8 universal microsatellites, Ccmp1, Ccmp2, Ccmp3, Ccmp4, Ccmp5, Ccmp6, Ccmp7 and Ccmp9 (Weising and Gardner, 1999), and 1 tobacco microsatellite, Ntcp9 (Bryan et al., 1999). For determination of polymorphism, these nine microsatellite markers were tested on 10 individuals covering the whole distribution range (Table 1). Three chloroplast microsatellites (cpSSR) were polymorphic (Ccmp3, Ccmp5 and Ntcp9) and were chosen to lead the global analysis of the 364 individuals from 71 populations. The protocol used for amplification and genotyping is described in Müller et al. (2009).

Genotyping of nuclear SSRs

Among the 16 polymorphic nuclear SSR markers (nucSSRs), 4 were discarded as many genotypes were missing. Finally, 12 polymorphic nuclear SSR markers (Allal et al., 2008) were used (mCIRVp08, mCIRVp14, mCIRVp28, mCIRVp70, mCIRVp113, mCIRVp134, mCIRVp159, mCIRVp167, mCIRVp168, mCIRVp175, mCIRVp181 and mCIRVp191). PCR conditions and genotyping methods were identical to those described by Allal et al. (2008).

Statistical analyses

Chloroplast sequences

The chloroplast genome is assumed to be non-recombinant, and was thus treated as a single haploid locus. Haplotypes were defined as distinct combinations of alleles found at the two loci, matK-trnK and trnQ-trnS. Sequences were concatenated for each individual, and global alignment was performed using the MUSCLE algorithm implemented in CODONCODE ALIGNER 3.5.7 (Codon Code Corporation, Dedham, MA, USA). Phylogenetic relationships among haplotypes were assessed by reconstructing the haplotype network with NETWORK 4.516 software (http://www.fluxus-engineering.com/) using the median-joining network algorithm (Bandelt et al., 1999).

Chloroplast SSRs

Haplotypes (chlorotypes) were defined as distinct combinations of alleles at the three cpSSRs (see Supplementary Appendix S2 in supporting information). For each population, we computed haplotypic diversity (Hcp) (Pons and Petit, 1995), the number of chlorotypes (nacp) and rarefied haplotypic richness (HR) using the CONTRIB 1.02 program (Petit et al., 1998). HR is the expected number of different chlorotypes found in each population using a standardized sample size (n) fixed as the smallest available number of genotyped individuals, following the formula given by Comps et al. (2001). In our case, the smallest population contained only two individuals. Although this statistic is not biased by a small sample (n=2), we eliminated seven populations containing fewer than four individuals to better account for rare alleles and to improve precision. The variation trend of haplotypic richness was then examined at different levels of population stratification (between subspecies, between regions and within regions) using Kruskal–Wallis's test implemented in XLSTAT 2009 (Addinsoft, Paris, France). This nonparametric ‘one-way analysis of variance’ enables comparison of independent samples that do not follow a normal distribution. Following Kruskal–Wallis's test, pairwise comparisons were performed using Dunn's multiple comparison test with Bonferroni correction, XLSTAT 2009 (Addinsoft).

To assess differentiation among subspecies and populations within subspecies, and differentiation among regions and populations within regions, we used hierarchical analysis of molecular variance implemented in ARLEQUIN version 3.5 (Excoffier et al., 2005) set up for haploid data.

Two measurements of genetic differentiation described by Pons and Petit (1996) were used for the chlorotypes: GST differentiation accounting for frequencies and NST differentiation accounting for genetic distances between chlorotypes (Pons and Petit, 1996). If the contribution of stepwise mutation to genetic differentiation is negligible, the expected GST and NST values are equal. If the differentiation follows a phylogeographic pattern, stepwise mutation would result in NST>GST. Indeed, under a phylogeographic pattern, phylogenetically related chlorotypes are found more often within than among populations (Pons and Petit, 1996). To compare GST with NST, we used a simple test procedure implemented in the SPAGEDI program (Hardy and Vekemans, 2002). The test consists in comparing the observed NST with the distribution of NST obtained with 1000 estimates (pNST) under random permutation of the chlorotypes in the distance matrix between chlorotypes. In addition to the global test of NST vs pNST, we applied the test in pairwise comparison at different levels of population stratification (between regions and between populations within regions) in order to identify at which scale phylogeographic differentiation occurs.

A matrix of pairwise genetic distances was generated from population chlorotype frequencies using Cavalli–Sforza's chord measure (Cavalli-Sforza, 1967) implemented in the GENDIST program (PHYLIP, version 3.6; Felsenstein, 1993). From this matrix, a dendrogram was constructed using the NJ method of Saitou and Nei (1987) with the NEIGHBOR program of PHYLIP. The significance of each node was evaluated by bootstrapping frequencies over chlorotypes with 1000 replications using the SEQBOOT program of PHYLIP. The consensus tree was obtained with the CONSENSE program of PHYLIP with a cutoff set at 50%, and connections with geographic locations were displayed using GENGIS (Parks et al., 2009).

In the same way as for chloroplast sequences, a network was reconstructed using the median-joining network algorithm (Bandelt et al., 1999) implemented in the NETWORK 4.516 program (http://www.fluxus-engineering.com/).

Nuclear SSRs

To evaluate polymorphism using nuclear microsatellite markers (nucSSRs), we recorded allele frequencies, number of alleles per locus (nanuc), observed heterozygosity (Honuc) and expected heterozygosity (Henuc) using GENETIX 4.05 (Belkhir et al., 2004). Allelic richness (AR) is defined as the expected number of alleles in each population for a sample size set at the smallest number of individuals genotyped. AR was calculated with an adaptation of the rarefaction index using FSTAT version 2.9.3.2 (Goudet, 1995). In the same way as for assessment of haplotypic richness, we computed AR considering only populations with at least four individuals. Like for haplotypic richness, variation trends in allelic richness were examined at different levels of population stratification (between subspecies, between regions and within regions) using Kruskal–Wallis's test followed by Dunn's multiple comparison test with Bonferroni correction, implemented in XLSTAT 2009 (Addinsoft).

Like for cpSSRs, nuclear-based genetic structure was assessed using ARLEQUIN version 3.5 (Excoffier et al., 2005) and analysis of molecular variance. We tested for a phylogeographic signal using the SPAGEDI program (Hardy and Vekemans, 2002), evaluating the contribution of the stepwise mutation in the differentiation pattern by comparing observed RST with RST obtained after 1000 allele size permutations (pRST). If stepwise mutations do not contribute to differentiation, FST and RST values are equal, and RST is expected to be significantly higher than mean permuted RST under a phylogeographic pattern. In the same way as for cpSSRs, this permutation test was applied to global RST vs pRST and in pairwise RST vs pRST comparisons at different levels of population stratification (between regions and between populations within regions).

To assess the overall structure of nucSRR diversity, we used the STRUCTURE program version 2.3.3 (Pritchard et al., 2000). We used an admixture model with correlated allele frequencies, performing 10 independent repetitions for each K (ranging from 1 to 20), with 100 000 Markov chain Monte Carlo repetitions and an 80 000 burn-in period. To identify the optimal K value, we examined absolute values of the second-order rate of change of the likelihood distribution divided by the standard deviation of likelihoods (K), as described by Evanno et al. (2005). To explore divergence between inferred clusters, we examined the genetic differentiation (FST) between these clusters and their ancestral population, predicted by STRUCTURE (Pritchard et al., 2000). Phylogenetic relationships among populations were assessed using population allele frequencies with the PHYLIP package (Felsenstein, 1993) using a similar protocol as for cpSSRs.

Bioclimatic data and ecological niche modelling (ENM)

The ENM implemented in DIVA-GIS software version 7.2 (Hijmans et al., 2001) enables prediction of the spatial distributions of a species on a bioclimatic basis using the BIOCLIM algorithm (Busby, 1991). From the climate characteristics of given species occurrences, this algorithm gives a percentile distribution (between 0 and 100 percentile) for each climatic variable. On the basis of homocline matching (Booth et al., 1987), the BIOCLIM algorithm then compares the climatic characteristics of each grid cell in a geographic layer with the percentile distributions of variables inferred from species occurrences. The program, run with the BIOCLIM ‘classic’ option (http://www.diva-gis.org/docs/DIVA-GIS5_manual.pdf), determined six types of area according to their suitability: ‘not suitable’, that is, one or more climate variables are outside the 0–100 percentile envelope; ‘excellent’, that is, within the 20–80 percentile envelope for each variable; ‘very high’, that is, within the 10–90 percentile envelope; ‘high’, that is, within the 5–95 percentile envelope; ‘medium’, that is, within the 2.5–97.5 percentile envelope, and marginal areas of ‘low’ suitability, for which all variables are at the boundaries of the 0–100 envelope (that is, between 0–2.5% and 97.5–100%). Considering a 5% threshold, we did not include marginal areas of ‘low’ suitability. We used the occurrence data of V. paradoxa from 126 sample sites (see Supplementary Appendix S3 in Supporting information) and considered seven climatic variables at spatial resolutions of 30 arc-seconds (about 1 × 1 km): (1) annual mean temperature; (2) mean diurnal temperature range; (3) maximum temperature of the coldest month; (4) minimum temperature of the coldest month; (5) annual precipitation; (6) precipitation seasonality; and (7) precipitation of the wettest quarter. Percentile distributions of these variables were created from current climate conditions and projected onto the layers of current, LGM (21 000 years BP), and the LIG period (120 000–140 000 years BP). Using current climate conditions (1950–2000) derived from the WorldClim global climate database (Hijmans et al., 2005), ENM enabled us to check that our 126 sample sites were suitable to recover the known actual distribution of V. paradoxa (Hall et al. 1996) and, thus, to describe its climatic preferences. Using data on the LGM obtained from the Paleoclimate Modelling Intercomparison Project Phase II (PMIP2) database and achieved under the community climate system model (CCSM3) (Collins et al., 2006), we illustrated the distribution shift of V. paradoxa during this period of major climate perturbations (Maley, 1996). Finally, to describe the potential distribution of the species before LGM perturbations, we predicted the suitability of V. paradoxa using data on the LIG period generated by Otto-Bliesner et al. (2006) and provided by the WorldClim global climate database. The performance of the model under current conditions was tested by measuring the area under a relative operating characteristic curve (AUC) (Pearce and Ferrier, 2000) using DIVA-GIS software (Hijmans et al., 2001). AUC is commonly used to evaluate ENM models by measuring their ability to distinguish locations where the species is present from those where it is absent (Hanley and Mcneil, 1982). ENM with AUC>0.7 is considered to perform well (Fielding and Bell, 1997).

Results

Chloroplast sequences

From alignments of the two polymorphic chloroplast sequence intergenic spacers trnQ-trnS (601 bp) and matK-trnK (452 bp), we identified only 3 single-nucleotide polymorphisms among the 40 populations (58 individuals) of V. paradoxa. The combination of these mutations (one located in trnQ-trnS and two in matK-trnK) resulted in a total of four haplotypes: H1, H2, H3 and H4. H1 was the predominant haplotype with a global frequency of 74.1%, and was shared by West African and Ugandan populations. H2, with a global frequency of 10.3%, was found in Cameroon. H3, found in 5.2% of the individuals, was limited to populations in Chad. Finally, H4, with a global frequency of 10.3%, was found in most individuals in Central African Republic (one tree in southern Central African Republic displayed the H1 haplotype). Therefore, most of the populations were fixed for a single haplotype. We identified a single haplotype among the five individuals of our outgroup taxon B. toxisperma.

The haplotype network (Figure 1) indicates that haplotypes of V. paradoxa are poorly divergent, and suggests that H2, H3 and H4 are derived from the H1 haplotype through a single mutation. The haplotype of the outgroup species was phylogenetically close to V. paradoxa haplotypes, and H1 was closer to the outgroup haplotype than H2, H3 and H4. ‘H1’ populations were located in both West Africa and Uganda, suggesting either a West African or East African origin for existing populations of V. paradoxa.

Figure 1
figure 1

Phylogenetic relationships among haplotypes (H1, H2, H3 and H4; B. toxisperma used as outgroup species) using the median-joining haplotype network (NETWORK 4.516 program).

Chloroplast SSR markers

Diversity patterns

We identified three polymorphic loci, ccmp3, ccmp5 and ntcp9, including, respectively, three, two and seven variant alleles. These variants were combined into 15 chlorotypes, resulting in high total haplotypic diversity with Hcp=0.9. However, Table 2 shows very poor haplotypic diversity at the population level, with an average of Ĥcp=0.17, because of the small sample size. Figure 2a shows variation in haplotypic richness (HR) on a geographic basis. We observed major variations in haplotypic richness HR among populations, from 0 to 2. No significant differences were detected between subspecies (P=0.079), or between regions (P=0.054), in HR values using the Kruskall–Wallis test. However, in West Africa, we observed a central zone (around Togo) exhibiting high haplotypic richness values and bounded by two zones with lower HR (Figure 2a). With the Kruskall–Wallis test, we detected significant differences in haplotypic richness between these areas (P<0.001), and Dunn's pairwise comparisons with Bonferroni correction showed that the central zone around Togo had significantly higher haplotypic richness (HR=1.12) than the eastern (HR=0.44) and western (HR=0.09) flanking zones (Bonferroni-corrected P=0.017) (Figure 2a).

Table 2 Diversity parameters calculated per population and averaged per country
Figure 2
figure 2

Trends in haplotypic (cpSSRs) and allelic (nucSSRs) richness. Diameters of circles vary proportionally with richness values. Areas bounded by a red dotted line had significantly higher values than others with Dunn's comparison test with Bonferroni correction. (a) Trends in haplotypic richness HR. (b) Trends in allelic richness AR. A full color version of this figure is available at the Heredity journal online.

Differentiation

We identified 15 chlorotypes in the natural area (Figure 3a), with frequencies ranging from 1.1 (D) to 19.8% (C). Eleven chlorotypes were found in the subspecies paradoxa, including eight haplotypes (B–C–F–G–H–L–N–O) limited to West African populations, and three (A–I–M) restricted to Central Africa. Two private haplotypes (E–K) were limited to nilotica subspecies. Two haplotypes (D–J) were shared between the subspecies in Central African Republic (paradoxa) and Sudan (nilotica).

Figure 3
figure 3

Genetic structure of chloroplast microsatellites across 374 individuals from 71 populations. (a) Geographical distribution of the 15 chlorotypes of V. paradoxa. Unrooted neighbor-joining tree of populations drawn with GENGIS software (Parks et al., 2009) using Cavalli-Sforza genetic distances. Numbers at the base of the branches are percentages corresponding to the bootstrap values after 1000 replications. (b) Median-joining network of chlorotypes computed using NETWORK (http://www.fluxus-engineering.com/), with 1, West African chlorotypes; 2, East African chlorotypes; 3 and 4, Central African chlorotypes. A full color version of this figure is available at the Heredity journal online.

Evaluation of differentiation using hierarchical analysis of molecular variance revealed populations to be significantly differentiated at subspecies level, with 37% of the variation detected among subspecies (FCT=0.37, P=0.00), 57% among populations within subspecies (FSC=0.91, P=0.00) and only 6% of variation within populations. Differentiation at regional level (West Africa, Central Africa and East Africa) was even stronger, with 58% of genetic variation detected among geographical groups (FCT=0.58, P=0.00), 37% among populations within groups (FSC=0.87, P=0.00) and only 5% within populations.

Phylogeographic patterns

Overall differentiation among populations was very high and significant with GST=0.81 (P=0.00) based on unordered alleles, and NST=0.90 (P=0.00) based on ordered alleles. The global test of phylogeographical structure was significant (NST>pNST, P<0.001) with pNST=0.80, revealing a clear phylogeographical pattern in chloroplast variation. However, this phylogeographical signature was no longer detectable within regions (West: P=0.18; Central: P=0.60; and East: P=0.49), indicating that mutation did not contribute to chloroplast differentiation within geographical zones. Pairwise comparisons of NST vs pNST between West and East Africa revealed a highly significant phylogeographical signal (NST=0.69 and pNST=0.35 with P=0.00), but not between West and Central Africa (NST=0.36 and pNST=0.23 with P=0.13), or between Central and East Africa (NST=0.38 and pNST=0.4 with P=0.47).

Phylogenetic relationships between populations were supported by strong bootstrap values (Figure 3a). The frequency-based NJ tree displayed four clades (I–IV), which perfectly match the geography. Clade I contains marginal western populations (populations 1–4 and 6), suggesting their isolation from other West African populations (populations 5 and 7–42). Clade II contains other West African populations. Clade III includes most Central African sites, apart from Central African Republic. Clade IV contains eastern populations, combining the subspecies paradoxa (Central African Republic) and nilotica (Sudan and Uganda).

Phylogenetic relationships among chlorotypes illustrated in the median-joining network (Figure 3b) suggested global differentiation following a phylogeographic pattern, but with some inconsistent reticulations. Private West African chlorotypes were closely related, defining group 1 in the network. Within this group, each chlorotype was separated from 2 to 4 bp and others by only 1 bp, explaining the lack of a phylogeographic pattern in West Africa. Group 2 in the network includes the three East African nilotica chlorotypes. These East African chlorotypes were connected to West African chlorotypes: the network showed that L (found in Senegal) and K (found in Sudan) were separated by only 1 bp. Central African chlorotypes were split into two distant groups in the network. Haplotype M (group 3) was the only chlorotype from southern Cameroon related to ‘N’, a West African chlorotype found in Nigeria. However, the four missing chlorotypes between M and N suggest long-term isolation of populations located in southern Cameroon from West African populations. The other Central African chlorotypes (group 4) were found to be closely related to East African haplotypes, suggesting a common origin.

Nuclear SSR markers

Diversity patterns

A total of 173 alleles were detected at 12 nuclear SSR loci in our collection, with an average of 14.41 alleles per locus. Low variation was observed in mCIRVp08 and mCIRVp175 with five alleles, while mCIRVp167 was highly variable with 25 alleles. Expected and observed (in parenthesis) heterozygosity varied markedly from Henuc=0.27 (Honuc=0.26) in Mahina (Mali) to Henuc=0.65 (Honuc=0.74) in Kanté (Togo). Average values per country are listed in Table 2.

Allelic richness (AR) varied from 1.8 to 3.4 among populations, but with no clear trend (Figure 2b). However, the Kruskal–Wallis test revealed significant differences between paradoxa (AR=2.8) and nilotica (AR=2.4) subspecies (P=0.016), with significantly higher paradoxa AR with the Dunn's pairwise comparison test with Bonferroni correction (Bonferroni-corrected P=0.05). In addition, the Kruskall–Wallis test detected significant differences in allelic richness between geographical regions (P=0.006), and Dunn's pairwise comparisons with Bonferroni correction highlighted significantly higher allelic richness in West Africa (AR=2.9) than in Central (AR=2.5) and Eastern (AR=2.4) Africa (Bonferroni-corrected P=0.017) (Figure 2b).

Differentiation

In the same way as with chloroplast SSRs, with nuclear SSR data, analysis of molecular variance revealed that the subspecies nilotica and paradoxa were significantly differentiated: 22% of variation was detected between subspecies (FCT=0.22, P=0.00), and 17% between populations within subspecies (FSC=0.21, P=0.00). Likewise, differentiation was significant at the scale of geographic regions, with 19% of variance explained by the regions (FCT=0.19, P=0.00) and 16% among populations within regions (FSC=0.19, P=0.00). We can thus assume that geographic distribution is a strong determinant of the nuclear-based genetic structure of V. paradoxa. However, up to 65% of the variation was detected within populations, in contrast to 5% with chloroplasts.

Using the Bayesian-based approach of Pritchard (Pritchard et al., 2000), two peaks of K for K=2 and K=4 were detected according to the method of Evanno et al. (2005). The clustering obtained for K=2 (Figure 4a) separated populations into two geographically consistent groups with low levels of admixed individuals: the ‘West’ group included populations from West Africa and the ‘East’ group included the remaining populations. This clustering in two groups constituted the upper level of the structure in V. paradoxa. Although K=2 was identified as the best model, K=4 (Figure 4b) obtained high probability and revealed very clear substructures in higher groups. Within the ‘East’ group, we detected a substructure that distinguished a ‘Central’ (C, ssp. paradoxa) from an ‘Eastern’ group (E, ssp. nilotica), consistent with the distribution of the two subspecies in this zone. In West Africa, populations were also split into two groups consistent with their geographical distribution, ‘West 1’ (W1) including most western populations and ‘West 2’ (W2) populations from Nigeria. However, W1 included a large number of admixed individuals. Bayesian analysis revealed migrants and admixed individuals between the ‘Central’ and ‘Eastern’ clusters of STRUCTURE, and thus between nilotica and paradoxa (Figures 4b and c). In comparison to the ancestral population predicted by STRUCTURE, clusters W1 (FST=0.06) and W2 (FST=0.05) displayed the least differentiation, whereas clusters C (FST=0.12) and E (FST=0.23) diverged further from the potential ancestral population. A PCA analysis on nuclear microsatellite allele frequencies was highly congruent with the STRUCTURE analysis (results not shown).

Figure 4
figure 4

Genetic structure of nuclear microsatellites across 374 individuals from 71 populations. (a) Bar plot showing clustering of individuals by STRUCTURE with K=2 (Pritchard et al., 2000). (b) Bar plot showing clustering of individuals by STRUCTURE WITH K=4. (c) Geographical plot of clustering of individuals by STRUCTURE WITH K=4: W1: ‘West 1’; W2: ‘West 2’; C: ‘Central’; E: ‘Eastern’. A full color version of this figure is available at the Heredity journal online.

Phylogeographic patterns

Overall differentiation among populations assessed with nuclear SSRs was high and significant for both the index based on unordered alleles (FST=0.21, P=0.00) and that based on ordered alleles (RST=0.28, P=0.00). Comparisons of RST and pRST revealed that the nuclear microsatellite diversity pattern of V. paradoxa had a significant phylogeographic signature with pRST=0.21 and RST>pRST (P=0.004). The phylogeographic signal was highly significant for each pairwise regional comparison: between West and Central Africa (RST=0.16 and pRST=0.07 with P=0.005), between West and East Africa (RST=0.27 and pRST=0.07 with P=0.00) and between Central and East Africa (RST=0.25 and pRST=0.08 with P=0.00). However, like for chloroplasts, variation trends within geographic regions did not reveal a phylogeographic pattern (West: P=0.49; Central: P=0.25; East: P=0.34), suggesting a major contribution of migration drift within geographical regions. Finally, we performed pairwise comparison between cluster groups ‘W1’ and ‘W2’ inferred by STRUCTURE (Figure 4b), but detected no significant phylogeographic signal (RST=0.06 and pRST=0.04 with P=0.11).

The degree of admixture of each population linked to their geographical location confirmed that the clustering obtained with the Bayesian approach fitted the geographical distribution (Figure 4c). Moreover, higher levels of admixture were found at geographical junctions between clusters. Within the W1 group, admixed populations were most often found in Togo, Benin and Niger (populations 24, 25, 28, 29, 30, 31, 32 and 33). Populations 52, 53, 54, 56 and 60, at the boundary between paradoxa and nilotica zone, were also admixed (Figure 4c). Phylogenetic relationships at population level based on allele frequency were poorly informative, as nodes were not significant (data not shown).

Present and previous theoretical distribution of V. paradoxa

From our database containing 126 sample sites, we checked that the zone theoretically suitable for the species under current conditions (Figure 5a) was consistent with its known distribution (Hall et al., 1996). A decrease in suitability was observed at the junctions between the geographic regions (West, Central and East), suggesting lower species density, and extending areas suitable for V. paradoxa to regions where the species is in fact absent (below the equator). The AUC, averaged over the seven climatic variables, was scored at 0.79, indicating that the accuracy of the model was reasonably good for ENM predictions. ENM of the species during the LGM (21 000 years BP) showed a much smaller suitable zone (Figure 5b): while West Africa remained highly suitable, the putative distribution of V. paradoxa from Cameroon to Uganda was reduced and fragmented. In this part of Africa, three putative areas were detected where the species could have been limited: near Lake Chad, in southern Central African Republic, and in northern Uganda (see zones 1, 2 and 3 in Figure 5b). Finally, we predicted probable suitable areas for the shea tree during the LIG period around 120 000–140 000 BP (Figure 5c). Results showed a potential distribution pattern divided into two main zones. The first zone extended from present-day Senegal to the north of Central African Republic, but the zone was almost completely disrupted at the Nigerian border. The second zone area mainly comprised Sudan and northern Uganda. These zones were separated by a large gap (200 km) at the border between Sudan and Central African Republic, consistent with the current geographical gap between the two subspecies nilotica and paradoxa (Hall et al., 1996).

Figure 5
figure 5

ENM of Vitellaria paradoxa at three different periods: (a) today's predicted and reported natural range; (b) potential distribution during the LGM (20 000 years BP) and refugia (1: around Lake Chad; 2: northern Congo; 3: northern Uganda); (c) potential natural range during the LIG period (120 000–140 000 years BP). Colors represent the probability of suitable climatic conditions from red (excellent) to green (medium). Envelopes were produced using the Worldclim/PMIPII database and DIVA-GIS software version 7.2 (Hijmans et al., 2001). A full color version of this figure is available at the Heredity journal online.

Discussion

Genetic diversity of V. paradoxa

In our study, nuclear and chloroplast SSRs genetic diversity was higher than that observed in previous studies on V. paradoxa (Fontaine et al., 2004; Sanou et al., 2005). This novel assessment based on a larger sample is consistent with diversity detected on angiosperms in other studies, (Müller et al., 2009) and meets expectations concerning the diversity of widespread tree species with similar biological attributes (Hamrick et al., 1993). However, we noted that the level of SSR diversity varied markedly among zones in our study (Figure 2). Possible explanations for higher levels of diversity are the presence of ancient glacial refugia or the confluence of migration routes (Petit et al., 2003), and we propose some hypotheses below. In addition to SSR markers, polymorphism within chloroplast intergenic spacers revealed very weak diversity with only four haplotypes. This poor variability was comparable to the diversity detected with chloroplasts in other widespread tree species, such as Pinus pinea L. (Vendramin et al., 2008) and Adansonia digitata (baobab), another widespread savanna tree (Pock Tsy et al., 2009). Therefore, in contrast with species harboring high levels of intergenic spacer polymorphisms (Dick and Heuertz, 2008), these markers are not an effective tool to study the phylogeography of V. paradoxa. However, the low polymorphism detected in these sequences was consistent with the low rate of nucleotide substitution in chloroplast genomes, around 10−9 (Wolfe et al., 1987), compared with SSR mutation rates of from 10−4 to 10−2 (Vigouroux et al., 2002) with nuclear SSRs and from 10−5 to 10−3 with chloroplast SSRs (Navascués and Emerson, 2005).

A phylogeographic pattern driven by climate changes

We showed that the genetic diversity of V. paradoxa has a clear east–west structure (Figures 3 and 4), and revealed a strong phylogeographic signature from chloroplast (GST=0.81; NST=0.90; NST>pNST) and nuclear SSRs (FST=0.21; RST=0.28; RST>pRST). We identified two major genetic groups geographically separated by the Adamawa Highlands: a ‘West’ group, which is relatively homogeneous over the 3000 km area comprising West Africa, and an ‘East’ group divided into two subgroups (Figure 3). We propose that the strong phylogeographic pattern of V. paradoxa is explained by past climate shifts and that LGM perturbation is a major factor in explaining this pattern. During the LGM, the climate in Africa was colder and drier; this shift was particularly marked in eastern sub-Sahelian Africa and was characterized by small incursions of the Sahara desert into this region (Lioubimtseva et al., 1998). Consistent with these climatic perturbations, ENM predictions indicated that the potential distribution of V. paradoxa in this zone was highly fragmented (Figure 5b). However, ENM results should be interpreted with caution because they depend on the reliability of the climatic models (current and ancient) resulting from mathematical interpolations and global smoothing of collected data (Hijmans et al., 2005). For example, the poor quality of climatic documentation resulted in local over-estimation of the potential distribution of a mountain bird species (Cranioleuca henricae) in Bolivia (Soria-Auza et al., 2010). Consequently, we suggest using only ENM to describe general trends of species range in our study. In addition, ENM predictions assume a no evolution model, and thus do not consider the adaptation ability and the dispersal dynamics of species across ages. Having taken these limitations into account, using ENM on LGM data, we identified two reduced areas that could have formed glacial refugia for populations of V. paradoxa in this part of Africa: around Lake Chad and in northern Uganda, along with other marginal refugia (Figure 5b). We suggest that these refugia could be at the origin of the genetic divergence of V. paradoxa in eastern sub-Sahelian Africa. In contrast to the eastern area, West Africa was less impacted by the climate change during the LGM, probably owing to the effect of the monsoon from the Gulf of Guinea (Braconnot et al., 2000). As a result, although the eastern suitable area for V. paradoxa was probably deeply disturbed, ENM results suggest that distribution of the species was less impacted, and thus probably conserved in West Africa (Figures 5b and c). Although the limited LGM perturbation in West Africa explains the high level of diversity observed and the hotspot detected in its central area (around Togo) (Table 2 and Figure 2), the poor structure detected in this large region (Figure 4) could also result from the intense human activity in the area. Palynological studies reported human migrations and agropastoral activities involving shea trees 3000 years ago in Burkina (Neumann et al., 1998), and the evidence of human selection of V. paradoxa has been found in this zone (Lovett and Haq, 2000; Maranz and Wiesman, 2003). In addition to past climate variations, we propose that the strong differentiation between West and Central African populations was probably preserved by topographic elevations. For instance, the Adamawa Highlands clearly represent a boundary between West African and other populations. Peaking at 1.300 m a.s.l. between Nigeria and Cameroon (Figure 3), this mountain range creates a colder climate resulting in a local barrier to V. paradoxa colonization, and limiting gene flow between West and Central Africa. Therefore, apart from the possible secondary role of Adamawa, we propose that the pan-continental phylogeographic pattern of V. paradoxa is fully consistent with perturbations during the LGM, and that this pattern was little affected by the dispersal dynamics of the species.

Differentiation between paradoxa and nilotica

Results in the literature argue for species differentiation in two non-overlapping subspecies based on slight variations in their leaf and stamina morphology (Hall et al., 1996). We consequently expected to detect a previous genetic differentiation based on this distinction. However, several molecular results suggest that this taxonomic distinction should be reconsidered. The median-joining network obtained with chloroplast SSRs (Figure 3c) revealed that Central African chlorotypes (paradoxa) are closer to nilotica than to other paradoxa chlorotypes. In addition, the Bayesian approach applied for nuclear SSRs revealed a clear and strong differentiation of the species in two groups on the two sides of the Adamawa Highlands: a western group comprising West African populations (ssp. paradoxa) and an eastern group including nilotica and paradoxa from Cameroon to Uganda (Figure 4a). The molecular distinction between subspecies nilotica and paradoxa appears only as a substructure of the eastern group for K=4 (Figure 4b). These results suggest that paradoxa populations in eastern sub-Sahelian Africa (from Cameroon to Central African Republic) share a common evolutionary history with nilotica rather than with other paradoxa from West Africa, tightened by the inter-subspecies migrations and hybridizations revealed in our study (Figure 4). Moreover, migration events between subspecies were also supported by chloroplast microsatellites. Indeed, the nilotica migrant detected with nuclear SSRs in Central African Republic shared nilotica chlorotype (‘J’), and two admixed individuals found in Sudan shared a paradoxa chlorotype (‘D’) (Figure 3). We cannot exclude that these shared chlorotypes are because of homoplasy or shared ancestral polymorphism rather than to migration, however, the chloroplast microsatellite mutation rate is likely to be low, reducing the effect of homoplasy (Navascués and Emerson, 2005). On the basis of this molecular differentiation, we suggest that the distinction between subspecies perhaps occurred spatially on the two sides of the Adamawa Highlands, which may be consistent with the trend observed in lipid composition, suggesting that shea butter in Cameroon is of the ‘nilotica’ type (Maranz et al., 2004).

Hypothesis on the origin of V. paradoxa

Although no evidence in the literature proposes an origin for V. paradoxa, combined results obtained with nuclear and chloroplast markers provided new insights into present populations. Using both chloroplast and nuclear SSRs, higher diversity was detected in West Africa, particularly around Togo (Figure 2, Table 2), suggesting ancient populations in this area (Petit et al., 2003). This result has been confirmed by another study (Logossa, personal communication) on 674 individuals from 38 populations sampled throughout West Africa. In addition, Bayesian analysis of nuclear SSRs showed that West Africangroups displayed minimal divergence from their theoretical ancestor (FST=0.05), whereas other groups diverged more (Central: FST=0.12; East: FST=0.23). This result indicates a historically higher effective population size in West Africa, which could be compatible with the origin of V. paradoxa being in this zone. This hypothesis fits with the recently inferred West African origin of the baobab (Pock Tsy et al., 2009), another African savanna tree whose distribution north of the equator is very similar to that of V. paradoxa. However, low FST relative to ancestral populations and higher diversity in West Africa does not necessarily result from a more ancient origin of the species compared with the other regions. To be able to conclude on the origin of V. paradoxa, additional DNA sequence data are required, as chloroplast sequence analysis indicates the presence of likely ancestral haplotype both in West Africa and in Uganda (Figure 1).

Outlook for conservation of V. paradoxa in the African savanna biome

The savanna is demographically, environmentally and economically a major African ecosystem that requires urgent management efforts (Sankaran et al., 2005). Despite its importance, forecasts of anthropogenic climate change on savanna biome are lacking. In our study, we identified West Africa as a diversity hotspot for V. paradoxa, possibly because of a weaker impact of climate shifts on the savanna tree in this zone. Given the greater past perturbations detected farther east, we suggest that current global warming could affect this part of the species area first, and by extension, the associated biotope. Consequently, major efforts should be made to protect the species in this zone. Owing to the possible buffering effect of the monsoon in Guinea, West African V. paradoxa populations should be less affected by the current climate shift, but we, nevertheless, suggest that efforts should be increased to maintain the diversity of this zone, including identifying and protecting the numerous V. paradoxa landraces in West Africa.