Geographic barriers and Pleistocene climate change shaped patterns of genetic variation in the Eastern Afromontane biodiversity hotspot

The Eastern African Afromontane forest is getting increased attention in conservation studies because of its high endemicity levels and shrinking geographic distribution. Phylogeographic studies have found evidence of high levels of genetic variation structured across the Great Rift System. Here, we use the epiphytic plant species Canarina eminii to explore causal explanations for this pattern. Phylogeographic analyses were undertaken using plastid regions and AFLP fragments. Population genetic analyses, Statistical Parsimony, and Bayesian methods were used to infer genetic diversity, genealogical relationships, structure, gene flow barriers, and the spatiotemporal evolution of populations. A strong phylogeographic structure was found, with two reciprocally monophyletic lineages on each side of the Great Rift System, high genetic exclusivity, and restricted gene flow among mountain ranges. We explain this pattern by topographic and ecological changes driven by geological rifting in Eastern Africa. Subsequent genetic structure is attributed to Pleistocene climatic changes, in which sky-islands acted as long-term refuges and cradles of genetic diversity. Our study highlights the importance of climate change and geographic barriers associated with the African Rift System in shaping population genetic patterns, as well as the need to preserve the high levels of exclusive and critically endangered biodiversity harboured by current patches of the Afromontane forest.


Angiosperms (habitat)
Coffea arabica (Afromontane) Two distinct groups across the Ethiopian Rift -ISSR; microsatellites 50,86 Lobelia giberroa (Afromontane) Continued is a temporal framework. Most Eastern African phylogeographic studies in plants do not include time estimates for colonization events and, therefore, do not allow discriminating between the mountain-forest bridge and long-distance dispersal hypotheses. This is further aggravated by the difficulties intrinsic to collecting material of widespread Afromontane species, due to the complex geographic landscape and the recent political instability 27 (see below). Here, we examine support for these two hypotheses, and the role of the Rift Valley as a long-term dispersal barrier, by reconstructing patterns of genetic variation in Canarina eminii Asch. & Schweinf, a strictly Afromontane species with a geographic distribution extending from the Ethiopian massifs to the SMSI in the northern end of Lake Malawi 28 , along the Rift Valley. This species grows mostly as a twig epiphyte on trees, mainly of genera Hagenia J.F.Gmel., Conopharyngia G. Don, and Afrocarpus (J. Buchholz & E.G. Gray) C.N. Page 28 , all characteristic elements of the Afromontane forest belt 28 . Little is known about the reproductive biology of C. eminii, except that the species is pollinated by sunbirds (unpublished field observations) and exhibits floral traits that have been associated with this type of ornitophilous pollination 29 (e.g., large orange-red coloured flowers producing abundant diluted nectar). Fruits are fleshy, with the seeds enclosed in a sticky sweet jelly, suggesting endozoochorous dispersal (e.g., by monkeys 30 ). Mairal et al. 31 recently reconstructed the biogeographic history of Canarina L., a small genus of three species within tribe Platycodoneae (family Campanulaceae). This genus exhibits a wide disjunct distribution across North Africa and probably originated from a Central Asian ancestor, which arrived to Eastern Africa in the Mid Miocene (13.7 Ma) through the Arabian Plate 31 . The lineage leading to the Eastern African species C. abyssinica diverged first, followed by the split between C. eminii and the endemic Canarian species C. canariensis 31 in the Late Miocene (c. 7 Ma). Divergence within C. eminii was traced back to 1.76 Ma 31 (southern range of the Great Rift Valley and followed by subsequent colonisation of the northern range). However, the intraspecific sampling in the study by Mairal et al. 31 was limited, which prevented analysis of patterns of genetic variation or examination of the processes population divergence in C. eminii.
In this study, we reconstruct the phylogeographic history of C. eminii among and within populations. Our aims were to: i) describe the geographic distribution of genetic variation within this species; ii) examine the role of the Rift Valley as a long-term dispersal barrier; iii) understand how Pleistocene climatic fluctuations might have affected population ranges, especially in relation to the "mountain-forest bridge" hypothesis and the role of isolated "sky islands"; and iv) provide insights into the phylogeography of an Afromontane epiphytic species. Since the epiphytic growth of this species is tied to dominant tree species of the Afromontane forests, it makes C. eminii a good case study for tracing the history of fragmentation and expansion of forest coverage in this region.

Results
Haplotype analyses. In order to examine patterns of genetic variation within C. eminii, we sequenced three highly variable plastid (pDNA) regions covering the entire distributional range of this species (see Materials and Methods). We generated 237 new sequences from 79 individuals: rpl32-trnLUAG (79 sequences), trnSGCU-trn-GUCC (79 sequences), and petB1365-petD738 (79 sequences). The number of nucleotide sites ranged from 648   (Table S2.5); these seem to be arranged into parallel routes, following migration from north to south and vice versa on each side of the Rift System, and connecting also the intermediate mountain ranges (Fig. 4b).
AFLP polymorphism, genetic diversity and structure. To complement the plastid signal (above),   (Table S2.6). All intervals analysed with SPAGeDi (see Supplementary Fig. S2.6) gave a significant negative result, suggesting that populations are not more different with increasing distance. In agreement with this, BARRIER detected three major boundaries separating the four STRUCTURE clusters: i) Rwenzori Mountains from Mt. Elgon (100%), ii) Mt. Elgon from all areas in the east (100%), and iii) Abyssinian massif from the remaining areas (99.7%) (Fig. 2c). Hierarchical AMOVA showed the greatest genetic variance among the same four groups identified by STRUCTURE and BARRIER (Table S2.4). The relatively low among-group genetic variance (Table S2.4) might be explained by the limited number of populations sampled at each side of the Rift, and therefore should be taken with caution. Yet, the geographic structure recovered by AMOVA is fully congruent with the results from the Bayesian clustering (STRUCTURE and BARRIER) and with the Fst and DW estimates, lending support to this analysis.

Discussion
The BEAST "nested dating" approach used here provided divergence time estimates similar to those in Mairal et al. 31 for the origin of Canarina (8.98 My.) and the stem-age of C. eminii (7.14 My.) (Fig. S2.1 Our results also support that divergence in C. eminii followed the opening of the Rift Valley in the Early Pliocene, which divided the Rift System into two landstrips 35 . A strong phylogeographic structure with the Great Rift Valley at its central axis was recovered for C. eminii populations. This agrees well with phylogeographic studies in other Afromontane groups (i.e, angiosperms, insects, and vertebrates, Table 1), all supporting the Rift System as an effective barrier to gene flow among populations. The depth of the phylogeographic split among STRUCTURE groups within C. eminii (Fig. 2c), the distribution of haplotypes (Fig. 2a,b), and the pattern of reciprocal monophyly and deep divergence times (1.92 Ma, HPD: 0.63-3.82 Ma) inferred with BEAST (Fig. 3), suggest a history of long-term isolation among populations east and west of the Rift Valley. In contrast, the large geographic distances and comparatively lower genetic distances between populations located on either side of the Rift Valley, suggest that geographic distance played a minor role in structuring patterns of genetic variation within C. eminii, confirmed by the lack of correlation the Mantel test establishes.
Patterns of colonization and biotic assemblage in the Eastern African Mountains have been argued to follow a northern (an "African pan-temperate element" 28,36,37 ) and a southern (an "Afromontane track" [38][39][40] ) component. Migrants coming from the north would have needed to overcome the widest point of the Rift Valley, the Afar Triangle (Fig. 1a), which acts as a major division before reaching the Ethiopian Plateaus, while southern migrants probably needed to split into two routes when reaching the two parallel volcanic arcs of the Albertine and the Gregory Rift 39 (Fig. 1a). Thus, dispersal into the Eastern African Mountains from the north or from the south would have been forced by topography to follow two parallel routes, one on each side of the Rift Valley 19,22 . This scenario is supported by the Discrete Phylogeographic Approach (DPA) analysis of C. eminii, showing two colonization routes that split along the topography of the Rift System (Fig. 4b). Though this method has been recently criticized as being too decisive 41 , other studies in Afromontane taxa have found similar phylogeographic connections through the east 26,27,42 and west 8,43,44 (Table 1).
Which mechanisms are behind this pattern of geographic isolation between the two sides of the Rift Valley for C. eminii and other Afromontane taxa ? As mentioned above, the origin of C. eminii considerably postdates the formation of the Rift, so strict vicariance is not feasible (nor for any of the groups in Table 1). However, the major faults and flooding of graben associated to the Rift activity, may have acted as a corridor filtering migration, spatially restricting dispersal to two parallel paths on either side of the Rift Valley. Indeed, the volcanic desert of Afar in Ethiopia (the hottest place on Earth), and the drier floor and deep lakes at the bottom of Rift Valley, are ecologically very different from the surrounding higher elevation areas 45,46 , and constitute ecological barriers to dispersal (i.e., hostile environments that are beyond the tolerance limits of montane organisms). It would be interesting to test this hypothesis using genetic and ecological evidence in other lineages exhibiting a similar pattern of genetic isolation along the Rift Valley (Table 1).
Eastern Africa has experienced several climate fluctuations over the last 800,000 years, resulting in mountain forests extending and retracting successively, with further isolation and bottlenecking events for some species 47,48 . The Afromontane forest probably did not descend far enough to reach the bottom of the Rift Valley In contrast to the glaciation-induced latitudinal range shifts in European taxa, which apparently resulted in a loss of genetic diversity (i.e., extinction and bottlenecks 54 ), repeated events of fragmentation and reconnection during Pleistocene glacial/interglacial cycles may have contributed to the accumulation and maintenance of genetic diversity in the Eastern African mountains, and could explain their current status as centres of endemicity 52,53 . This agrees well with the hypothesis of a more moderate effect of glacial cycles at equatorial latitudes, in contrast with regions at higher latitudes 54 .
Another explanation for the east/west connectivity pattern and the observed low genetic variance between populations on each side of the Rift in C. eminii and other Afromontane endemics, is long-distance dispersal of seed and pollen through vectors such as wind, insects, or birds; this in turn could have been reinforced by wider forest coverage in the past, i.e., greater area availability for seedling establishment 25 . Long-range seed dispersal is likely the best explanation for connections among populations of C. eminii across the Turkana Basin and the Uganda Gap, which probably constitute ecological barriers (climatically unfavourable land) for gradual SSD migration through forest connections. Results from the plastid DNA suggest seed dispersal on each side of the  Rift Valley, with more recent connectivity among the eastern side populations (Figs 2a and 3). On the other hand, the strong interpopulation structure found in the nuclear AFLP data suggests limited pollen dispersal (Fig. 2c). This could be explained by the strong territoriality and reduced mobility of Nectariniidae birds, the presumed pollinators of C. eminii. Some recent studies on these birds have revealed cryptic species within the same lineage occupying different mountain ranges, probably due to low connectivity between populations 44,55 . This pattern might have later been reinforced by a landscape strongly transformed by agriculture. Both plastid and nuclear genetic data suggest a pattern of greater historical isolation among populations located in the western side of the Rift Valley than among those in the east (Fig. 2, Table 2). Whether this is associated to greater past connectivity through forest patches (mountain-bridge hypothesis) in the east, or whether it reflects a more recent historical pattern associated to agricultural activity, is not clear. Similar to other studies 8,50,51 , the highest level of genetic diversity was found in the Harar Massif, an area known for harbouring the best-preserved fragments of Afromontane forest ( Table 2: Agere Maryam and Harenna Forest). This area may have acted as an important refuge for Afromontane forest organisms during glacial maxima. In the DPA analysis of C. eminii, the Harar Massif was inferred as the source of migration events on the eastern side of the Rift (Fig. 4b). For the western Rift, the SMSI was suggested as the ancestral area: these mountains harbour the highest plastid genetic diversity (Table 2), though these results could not be corroborated with the nuclear data (no AFLP sampling). The two arcs of the Rift Valley (Albertine and Gregory Rift) meet geographically in the north on the Central Highlands (Mt. Elgon and Cherangani Hills) and in the South in the SMSI (Fig. 1). These two regions (Mt. Elgon-Cherangani and the SMSI) apparently act as secondary contact points where the two parallel migration routes mentioned above interconnect 19,44 , and have been described as cradles of new genetic diversity 53 . This is congruent with our results in C. eminii, showing Mt. Elgon and the SMSI as a crossroad linking both sides of the west Rift System in the DPA analysis (Fig. 4), and both exhibiting high levels of genetic diversity ( Table 2). Mount Elgon is also the oldest and most species-rich volcano of Eastern Africa 13,56 , with endemic lineages that are related to groups in either the eastern 43 or the western side of the Rift Valley 8,43 . In contrast, the Afar Depression and the arid zone around the Turkana Basin seem to have acted as major barriers against gene flow ( Fig. 1a and 4b). The Afar Depression separates the eastern and western Ethiopian populations, north of the two volcanic arcs, while the aridity around the Turkana Basin and the Uganda Gap, might have constituted an important obstacle to gene flow between the eastern and western populations 57 . The low haplotype diversity found in the Abyssinian massif population could reflect long-term isolation due to the influence of glaciations, though a low DW value suggests that recent dispersal could be a more likely explanation.
In summary, our study suggests that the topography and land formations of the Great Rift Valley underlie the phylogeographic pattern of east/west vicariance observed in C. eminii and other Afromontane endemic species. Though population sampling was limited to discriminate between the "mountain-forest" bridge and the LDD hypotheses, Pleistocene climatic oscillations and cyclical expansion and contraction of the Afromontane forest appear to have played a role in structuring levels of genetic variation in these species. Short-range dispersal among forest patches on each side of the Rift probably was complemented by LDD across some regions such as the Turkana Basin or the Uganda Gap, historically devoid of montane forests. Isolated sky-islands such as Mount Elgon or the Harar Massif acted as both refuges and cradles of genetic diversity. Given that the forested areas of Eastern Africa are currently in serious decline 9 -especially the northern Ethiopian highlands subject to extensive logging-and that many of these harbour high levels of genetic variability (see Table 1, our study), designing new measures for the protection of these regions (e.g., new natural parks) may be crucial to preserve the endemic and highly-threatened Afromontane forest biota.

Methods
Sampling and DNA sequencing. The geographic distribution of C. eminii extends from the Ethiopian massifs to southern Tanganyika 28 . Despite our sampling effort (two field expeditions in 2009, 2015 to Ethiopia and one in 2010 to Kenya-Uganda), we failed to find C. eminii in many of the localities recorded by Hedberg in his classic monography of this species 28 . Much of the forest in these localities is highly degraded, threatened by the advancement of extensive agriculture, cattle rising, forestry, and other human activities (see Table S2.1 for a list of the visited localities for this study and a detailed description of the sampling). Despite this, we were able to sample fresh plant tissue from seven populations of C. eminii (Table S2. (Table S2.2), representing all major geographic blocks in the distribution of the species (Fig. 1a). Three plastid (pDNA) regions were sequenced for a total of 79 individuals. For details on PCR amplification and sequence alignment see Appendix S1.1 and Table S2 Phylogeographic analysis. Haplotype analyses were done on a concatenated dataset comprising all three sequenced plastid regions (see above). Genealogical relationships among haplotypes were inferred via the statistical parsimony algorithm 58 implemented in TCS 1.2.1 59 . The number of mutational steps resulting from single substitutions among haplotypes was calculated with 95% confidence limits, with gaps coded as missing data. Summary statistics for genetic diversity calculated for each population were: number of haplotypes H (n), haplotype diversity (Hd), and nucleotide diversity (π ); using DnaSP 5.1 with the option "not consider gaps" selected 60 .
Scientific RepoRts | 7:45749 | DOI: 10.1038/srep45749 Haplotype divergence times were estimated using Bayesian relaxed clocks implemented in the software package BEAST v.1.7.5 61 and using the 'nested dating approach' described in Mairal et al. 31 ; in this approach, a high phylogenetic-level dataset, including representatives of all three species of Canarina and nine outgroup taxa from Platycodoneae and Campanulaceae, was used to inform the clock rate of a linked population-level dataset of C. eminii, under a mixed Yule-coalescent model 62 . Congruence among results from the three plastid markers was tested by comparing clade support values for individual clades. See Appendix S1.2 for more details on these analyses.
The Discrete Phylogeographic Approach (DPA) of Lemey et al. 63 , implemented in BEAST, was used to infer ancestral ranges and to trace the history of migration events in C. eminii. It is based on a continuous-time Markov Chain process where the discrete states correspond to the geographic locations and the state transition rates to the migration rates between areas (Ronquist & Sanmartín, 2011). Bayesian Stochastic Search Variable Selection (BSSVS, Lemey et al. 63 ) was used to identify the rates (colonization routes) that are best supported by the data, using a cut off value of three for the Bayes Factor comparison. We defined six discrete areas, assigning each plateau and Rift mountain ranges to a different area 15,31 : 1) Harar massif, 2) Abyssinian massif, 3) East Gregory Rift (Aberdare Mountains), 4) Central Highlands (Mount Elgon and Cherangani Hills), 5) Albertine Rift (Rwenzori, Rwanda and Burundi), and 6) Southern Mountain Sky-Islands (SMSI: South Tanzania and Malawi) (see Appendix S1.3 for more details). AFLP fingerprinting. Laboratory molecular protocols for the AFLP analysis 64 were implemented using the AFLP plant mapping kit (Applied Biosystems). We could not use the individuals sampled from herbarium collections for AFLP fingerprinting, as this approach is quite sensitive to the quality of the starting DNA and thus requires well-preserved material 65 . Therefore, sampling of populations in the nuclear dataset was more reduced than in the pDNA sequence dataset. Genomic DNA was digested with the enzymes EcoRI and MseI and linked to the adaptors EcoRI (5'-CTCGTAGACTGCGTACC-3'/5'-AATTGGTACGCAGTCTAC-3') and MseI Amplified fragments were analysed using GeneMapper 3.7 software (Applied Biosystems), and peaks ranging between 100 and 500 bp recorded. AFLP Scorer software 66 was employed to run a reproducibility test, with the maximum acceptable error for each primer combination fixed at < 5% 67 . The AFLPdat R package 68 was used to determine the numbers of private fragments per population. Only unambiguous fragments shared among duplicates were scored. Data reliability was assessed by comparison of duplicates, using one or two individuals per population (21 tests). The reproducibility value was 88-100%, with a mean of 97.5%. AFLP data analysis. The resulting AFLP presence/absence matrix was analysed using AFLPSURV v.1.0 69 to estimate Nei's gene diversity (Hj), pairwise differentiation among subpopulations (F ST ), the percentage of polymorphic fragments per population (P), and the bootstrapped Nei's genetic distance matrix between individuals and populations 70,71 . The inbreeding coefficient (F IS ) was set to 0.1 as suggested by Hardy 72 . The permutation test involved 20,000 permutations. In addition, a Bayesian method was employed to estimate allelic frequencies, using a non-uniform prior distribution 73 . Ten thousand permutations were performed to calculate F ST values. Genetic distances between individuals, populations and geographic groups were also calculated. We used AFLPdat 74 to calculate DW value per population, equivalent to the weighted endemism value 75,76 ; this value is expected to be high in long-term isolated populations where rare markers should accumulate due to mutations, whereas newly established populations are expected to exhibit low values, thus helping to differentiate recent dispersal from more ancient isolation.
To distinguish genetic groups of individuals in the AFLP dataset, a comparison was made by constructing a pairwise similarity matrix for all individuals: Dice's coefficient was calculated and the resulting matrix was transformed using principal coordinates analysis (PCO) with Ntsys v.2.1 77 . Neighbour-nets of AFLP data were also calculated, both for individuals and populations, using the SplitsTree v.4.10 software 78 . To quantify the amount of genetic differentiation attributable to geographic and population subdivision, a hierarchical analysis of molecular variance was performed 79 in ARLEQUIN v.3.0 (Table S2.4). For assessing the structure of populations, we used the Bayesian method implemented in STRUCTURE 2.2 80,81 , assuming admixture conditions and uncorrelated allele frequencies between groups, 500,000 generations (plus a burn-in of 100,000) were run for K values of 1-10, with ten repetitions each. For each K value, only the run with the highest maximum likelihood value was considered. The LnP (D) for the successive decomposition of groups was used in all STRUCTURE analyses 82 . To test the effect of the spatial distance on the genetic structure of the C. eminii populations, correlations between genetic (measured as F ST ) and spatial distances between pairs of populations were determined using the Mantel permutation procedure in Ntsys. The genetic distance matrix used was based on the presence/absence matrix; the geographic distance matrix was based on absolute distances between the geographic coordinates for each collected population. In addition, the kinship multilocus coefficient (F IJ ) was estimated using SPAGeDi 83 to determine the spatial structure of the examined populations, taking spatial distances into account in the analyses. BARRIER v.2.2 84 was used to identify possible geographic locations acting as major genetic barriers among C. eminii populations, based on genetic distances. The significance of these was examined with 1000 bootstrapped distance matrices obtained using AFLPsurv. Data Accessibility. DNA sequences: Genbank Accession nos KF028817-KM189329. GenBank accessions, sampling locations and/or online-only appendices uploaded as online Supplemental material.